Friday, June 24, 2011

Error in default important level weights for the Full-text index mappings

The title for this post might seem somewhat cryptic if you haven’t worked with managed properties in FAST for SharePoint, and mapped them to different priority levels. But I will walk you thru it getting to the error.

First of all, the relevance score for a search hit in FAST for SharePoint is build up of many different values, where one part of the score is how important is the field which contains data where your query matched. As an example if your query matches words in the title it will get rated higher compared to if it matched in the body text of the document.

When you create a managed property which holds textual content, you can set how important this field is, from level 1-7 as seen on the image below. You can get to this screen by going to

Central Admin –> FAST Query SSA –> FAST Search Administration –> Managed properties –> <click a property> –> <scroll to the bottom of the page>

image

Using PowerShell we can list the weights behind the difference important levels.

$rankprofile = Get-FASTSearchMetadataRankProfile default
$content = $rankprofile.GetFullTextIndexRanks()|where-Object -filterscript {$_.FullTextIndexReference.Name -eq "content"}
$content.GetImportanceLevelWeight(1)
30
$content.GetImportanceLevelWeight(2)
10
$content.GetImportanceLevelWeight(3)
20
$content.GetImportanceLevelWeight(4)
30
$content.GetImportanceLevelWeight(5)
40
$content.GetImportanceLevelWeight(6)
50
$content.GetImportanceLevelWeight(7)
60


As you can see Level 1 has a weight of 30, the same as Level 4, and this is where the error is.

To rule out any magic going on behind the scenes I conducted a test. First I created three crawled properties, which each was mapped to three managed properties, as circled in red in the first image. Then I created three documents with the same content, where levelone.txt was indexed into the madcowone field, leveltwo.txt into madcowtwo and levelfive.txt into madcowfive. I also set the freshness weight to zero, to rule out the time factor on ranking.

When I executed a search against these three documents, they all got the same rank score of 39, but I could see via my FS4SP Query Logger tool that they did get different context scores, but they were sorted on random as you can see in the output (shortened for clarity):

image
I have highlighted the context score and also the level in which we got a hit, which corresponds to the name of the document. The reason for the low score is that the Context Weight doesn’t count as much compared to other factors in the static rank.

Next I changed the context weight from the default of 50 to 200. Executing the same query I now got these results, sorted in the “correct” order, levelfive.txt, levelone.txt and leveltwo.txt.

image

This means that the Level 1 field clearly ranks above Level 2 and 3, and this is most likely an error with the product. And you probably want to change the values for the importance levels in your deployments to match the expected behavior.