Thursday, April 18, 2013

Rank models in 2013–Main differences from 2010

Disclaimer: I’m by no means a math expert and my statements below might not be 100% accurate, but I try my best. Also, be careful when tuning the rank profile as changing numbers can have a big effect on your ranking.

With the new FAST search core, ranking has changed quite a lot from 2010. Newly published content on MSDN explains a bit more how rank is calculated and how you can change it.
As the O14 rank model is available in SharePoint 2013 (O15), I will try to outline some of the major differences you can expect to see regarding how results are ranked/sorted by default.
You can pull out the rank model xml yourself from both models using PowerShell.

$ssa = Get-SPEnterpriseSearchServiceApplication
$owner = Get-SPenterpriseSearchOwner -Level ssa

$o15 = Get-SPEnterpriseSearchRankingModel -SearchApplication $ssa -Owner $owner -Identity 8f6fd0bc-06f9-43cf-bbab-08c377e083f4
$o15.RankingModelXML > o15.xml

$o14 = Get-SPEnterpriseSearchRankingModel -SearchApplication $ssa -Owner $owner -Identity 9399df62-f089-4033-bdc5-a7ea22936e8e
$o14.RankingModelXML > o14.xml

Then it’s all a matter of comparing the models.

File formats

In O14 file formats are given quite a lot of weight compared to O15. In descending order file formats are prioritized like this (items in red have negative weights):

HTML, DOC, PPT, TXT, XML, XLS, Messages, Image, ListItem

O15 on the other hand ranks file formats like this:

PPT, DOC, HTML, ListItem, Image, Message, XLS, TXT, XML

The weight range also varies. O14 uses 1.45 to –0.031, while O15 uses 0.68 to -1,29.

The most noticeable difference is that PowerPoint's will move quite high in the result list for SharePoint 2013, and Excel quite a bit down for O15 (like it did for FAST Search for SharePoint as well).

The important thing here is to think about what files are important to you. In my experience PowerPoint’s should at least move down below Word documents. If pages should rank above PowerPoint depends largely on your content. But you should have an opinion and try it out.

Proximity

O14 also seem to put more emphasis on click distance and url depth (shorter links) compared to O15. And as far as I can tell, O15 will give improved relevance to documents where the query terms match exact on a title.

BM25F

Both models use BM25F as their main model. The easiest way to explain this model is that it will ranks a set of documents based on the query terms appearing in each document set in context to how many times the words appear in total in all documents and within each particular document. The formula is listed on the MSDN link at the top if you want to get into the details.

Both models seem similar, but there are weight differences, but it’s not easy to compare this without trying out queries for each model and see which rank module outputs the actual score. But that’s for another post.

Freshness

Neither model pays any attention to when a document was created, which it actually did in FAST Search for SharePoint. You can however add this to the rank model with an entry similar to the one below (copied from MSDN). Quite useful when you have content spanning decades to surface newer and often more valid content.

    
    
        1.0
    

Query Rules

If messing with rank models seem like black magic (which it is a bit to me), then go with query rules and XRANK boosts instead. Much easier to comprehend.

4 comments:

  1. Hi Mikael, very nice post!
    I would agree that using XRANKS and Query Rules is much more pratical in most cases. Although choosing correct value for XRANK is not a piece of cake, considering from XRANK contribution is not visible in rankdetail property. Nevertheless hand tuning of coefficients in ranking model is a real pain.

    I started to describe basics of new SharePoint 2013 ranking models, but luckily Microsoft released official version :)

    http://powersearching.wordpress.com/2013/03/29/how-sharepoint-2013-ranking-models-work/

    ReplyDelete
    Replies
    1. Hi Alexey,
      Rank tuning is hard no matter the method for sure :) Once I'm done with my current project I'll post what small changes I plan to incorporate. That would be re-arranging weights of file types and introduce freshness as a parameter.

      Delete
  2. Hi Mikael - is there somewhere I can get an explanation of how the Search Ranking works to an end user eg if this word is near to this word then the rank of the item increases. Thanks Nigel

    ReplyDelete
    Replies
    1. Nothing except you have to decipher the model yourself.. and us search people agree that proximity count too little in the default model. Your best bet is to explain the BM25F model.

      Something like:
      Words in a title are more important
      Words close to each other is important (but not enough)
      Unique words count more
      Words more frequent in a single doc count more

      etc etc... quickly quite complicated.

      Delete