Trey Jones at Wikimedia Foundation published some very interesting notes up about how to think about combining scores for search ranking (particularly Elasticsearch). I like this insight a lot:
addition is looking for ways to win, multiplication is looking for ways to fail
This is pretty interesting to me when thinking about how I chose to implement the ranking for the WordPress.org plugin search. Applying this insight to the way I combined signals in that ranking function comes up with a couple of interesting observations:
- The text matching features (phrases, title matches, etc) are looking for ways to win and boost the score. This was a pretty explicit goal of mine, but also partly driven by decoupling the matching of text from boosting on text.
- All of the other signals are looking for reasons to fail. Not updating the plugin, not testing it on latest WordPress, not resolving support threads, etc. There is some boosting also, but we do a lot to lower scores which is maybe related to some of the exact matching problems I am still looking at (especially after result number 10).
I’m not sure this is either good or bad, just an interesting model for thinking about it and something I need to think about some more. This somewhat matches the intuition that led me to separate out matching text from boosting text with individual features.
I also need to think more about whether I am using the right operations for weighting different scores. There’s a lot of great thoughts in these notes and Trey has a bunch of other notes that look interesting also.
Also it reminds me how great it is to have notes published for others to look at.
