Tuesday, October 22, 2013

Adding freshness boost to SharePoint Online

Freshness formula for ranking modelsEvery once in a while I get questions about customizing the rank model in SharePoint 2013. While it has never been easy to tune a rank model, it was somewhat easier with FAST ESP or FAST Search for SharePoint as the rank values was not normalized and it was possible to interpret how the rank was calculated. This is why I created the FS4SP Query Logger, quite useful when dealing with FS4SP.

An often useful part of ranking results is freshness, meaning newer items are pushed somewhat up the compared to older items.

But that was then, and this is now. The rank models in SharePoint 2013 has changed a lot and the resulting rank numbers are now normalized with more than 10 decimals. The numbers in the model itself are not easy to understand and I briefly wrote about this in Rank models 2013 – Main differences from 2010. I still don’t have a Ph.D in math, and I will leave that up to the clever people at FAST/MS who created these new model with neural networks and all. It’s possible to write simpler static models with 2013, but I think I have decided to stay away from it as long as I can, and do rank tuning using the XRANK operator instead. If I ever get a case where changing the model is the only way I’ll sign up for a rank tuning course over at the powers that be - benefit of living in Oslo :)

If you’ve followed my recent twitter statements and blog posts you see that it’s less and less code, and more and more configuration. I have by no means abandoned SharePoint on-premises or coding, but deploying by configuration, with the added benefit it might work in SharePoint Online as well, feels pretty good at the end of the day. You should check out my internal ISO 9001 #ITPro certificate at the office :-)

As mentioned on MSDN the default rank model in 2013 does not include rank for freshness. In fact, none of the models shipped with 2013 (or SharePoint Online) includes this. If you don’t know what freshness is, it’s how old a particular document is compared to today's date. The theory is that the newer a document is, it is more likely to be important, and should rank higher in a search result. Which for most of my customers is true and a valuable piece of the rank model.

By following the sample at MSDN you can modify the rank profile and add a rank section for freshness, which is pretty cool, and something I recommend for most customers (and it was default in FS4SP). How much weight freshness rank should have overall is something you have to test out a bit.

Oh, I forgot! Tuning the rank profile is only possible on-premises….

XRANK to the rescue

Note: A word of caution. Heavy use of XRANK may impact query performance, but that’s a hardware scaling issue in the land of #ITPro’s, and none of our concern for now :).

The formula for freshness is shown at the top of this post, and it’s not that hard to interpret. yFuture is a constant used if item dates appear in the future, which I’m not going to address as we’ll skip time travel of content for now.

c is a constant (0,0333 in the sample at MSDN) and x is the age of an item in number of days. y is then the resulting boost value given to the item based on it’s age.

Formatted in Excel terms we get the following formula: boost=1/(1+0,0333*days). Plotting this out for specific values we get the following logarithmic distribution with a long tail approaching zero:

image
Comparing the boost values for 1 and 580 days with the numbers on MSDN, you see they are equal. The next part is transforming this into an XRANK query. The XRANK operator in this case works like this: matching query XRANK(cb=value) boost query. The cb parameter is the constant boost added. You will also nest XRANK’s using parenthesis in order to add different values to different item ages.

To save query length you don’t have to do intervals with 1-7 days, 7-30 days and the weights above. By only checking on greater than, the boost values are additive and you have to adjust them accordingly to end up with the desired Max Boost. In the sample below the sum of the two cb values equals 1, which is correct for an item with age 0 days.

((my test query) XRANK(cb=0.9677731539727087) write>2013-10-20) XRANK(cb=0.03222684602729131) write>2013-10-21

I have created a generator page to save you the trouble of writing the nested XRANK statements which can be accessed from:

https://dl.dropboxusercontent.com/u/38490614/SharePoint/freshness.html

image

The only issue I have noted so far is that query rules are parsed using the users query locale. That means that queries with decimal numbers have to have the right delimiter. For English using a punctuation mark works great, for Norwegian I have to use a comma.
[Update] I solved this by using exponential notation instead and have updated the generator code.

If you want to use this in a query rule and add freshness rank to all results perform the following steps:

  1. Start creating a new Query Rule for Local SharePoint Results on either the SSA, Site Collection or Site level.
  2. Name the rule something like "Freshness Boost"
  3. Remove the Query Condition, as you want to match all queries
  4. Click "Change ranked results by changing the query"
  5. Paste the generated query
  6. Test the query on the Test tab
  7. Click OK and Save
  8. Move the new Query Rule to the correct place if you have other rules modifying the query
  9. Enjoy!



Happy freshness boosting!