Friday, January 9, 2015

How to: Boost metadata in SharePoint search results

As pointed out by Mike Fairly in the TechNet forums (see thread), the default rank profile, or any other rank profile included with SharePoint, does not give any weight in particular to tagged content.

This means that if you add managed metadata columns or Enterprise Keywords to a list or library, the terms used will merely help on content recall, but not provide any extra rank to the items moving them higher up in the search results.

On the other hand, when promoting the use of tagging to information workers, you might think they want a return on investment for those tags, meaning tagged items should appear higher in the result set.

There are several ways to go about fixing this:

  1. Create a custom rank profile
  2. Change the weight of managed properties
  3. Use dynamic ranking via the XRANK operator

The easiest in terms of implementation and flexibility is #2.

You may download working search configuration files to apply the tag boost from https://github.com/wobba/SearchConfiguration.

Without going too much into detail of the default rank profile, the part we are after is the BM25F (Best Match 25 Fielded) part of the rank profile. This part says something about how much weight should terms get if they appear in specific fields. This means that hits in a title should count more than a hit in the body text of a document.

A simplified listing provided by Igor Veytskin@MSFT, ranks the included fields in the default rank model ordered by relevant importance to each other:

  • Title                            100%
  • QLogClickedText     72%
  • SocialTag                    59%
  • Filename                     52%
  • Author                         41%
  • AnchorText               18%
  • Body                                7%

If you look at the search schema for each of the managed properties listed, you see they are mapped to different weight groups (Search Schema –> pick managed property –> Advanced Searchable Settings). For example Title is mapped to Context 1.

image

Important: The number in the context level does not correspond to importance and higher ranking. It’s merely a lookup number.

The levels are assigned as follows:

  • Title                                    1
  • QLogClickedText          3
  • SocialTag                       14
  • Filename                          2
  • Author                              5
  • AnchorText                    6
  • Body                                  7

By default tags (or any custom column) will be assigned to Context 0, and thus ranked even lower than the body content of a document, as it’s not ranked at all.

And now comes the hard part. By association, any managed property set to use the same weight group as any of the above managed properties will get the weight from that managed property when searching.

If you for example assign a managed property MyFooBar to Context 1, then search matches in that property will get boosted the same as titles. If you want less boost, use one of the other weight groups.  And that’s all there is to it – from a technical point of view. Also, the managed property you want extra rank on HAS TO BE MARKED AS SEARCHABLE. If not you won’t get the extra rank you are looking for.

image

The actual implementation

If you are using site columns in your libraries and lists and want to boost search hits in managed metadata columns or tags, then you can edit the automatically created managed properties for that column (named owstaxIdColumnName) and change the weight group. This works as automatic managed properties for managed metadata columns are marked searchable by default.

If you want to boost content in all managed metadata columns, regardless of where it’s used and the name, you have to perform the following steps:

  1. Create a new managed property on the SSA/tenant level and name it TagBoost
  2. Check the Searchable box
  3. Map the crawled property ows_taxId_MetadataAllTagsInfo to your new managed property
  4. Trigger re-crawl of your content

image

Note: There is an existing managed property named owstaxidmetadataalltagsinfo, but this property is not marked as searchable, thus changing the weight group has no effect. I don’t like to mess too much with existing managed properties, and thus opt to create a new one.

As an example picture a list item with the following columns and values.

Column Value
Title List test
Enterprise Keywords
(managed meta data)
red car

Using the SharePoint 2013 Query Tool and looking at rank details, for the search red car, you get extra rank from the Title managed property – by association from the TagBoost managed property as it’s mapped to Context 1.

image

You may download working search configuration files to apply the tag boost from https://github.com/wobba/SearchConfiguration.

References:

TechNet: Influence the ranking of search results by using the search schema

MSDN: Tune your ranking model with rank features

Trigger re-indexing of content in SharePoint Online

SharePoint Sample Search Configurations

23 comments:

  1. Hi Mikael. Sorry it has taken so long to get back to this. I am struggling through your assistance blindly :) . Here is what I have done:

    1. uploaded search configuration file. I used TagBoost-Max-SearchConfiguration.xml

    2. I notice it states "SearchBoost" and I see "SPSiteSubscription" and "Imported Successfully" so this seems good.

    2. Looked in Manage Search Schema > Managed Properties and I see TagBoost as per your image on this page.

    3. Performed a re crawl using the code you supplied at http://techmikael.blogspot.com.au/2014/02/how-to-trigger-full-re-index-in.html. I am running this code by pasting it into SP Online Management Shell.

    Do I need to wait or this an immediate effect?

    Am I doing it correctly. Any more steps? I am not getting errors, but no boosting either?

    Mike in Cairns

    ReplyDelete
    Replies
    1. That's it...andnypu should at least see different rank scores. How it affects actual sort order depends on a lot of factors and can vary.

      Delete
  2. Hi Mikael,
    I have follow the steps you provided. But in rank details, it shows content instead of title. Any suggestion?

    ReplyDelete
    Replies
    1. Yes, I did re-index for several times. If I change the title weight to 14, the rank will change. But in rank details, it's still shown as title.

      Delete
    2. It will always show as the managed property used in the rank profile - meaning the associated mp. If you want it to show with the correct mp, then you need to add that mp to the rank profile itself.

      Delete
  3. Hi Mikael..Thanks for the nice post. I am working on a requirement in which a list contains several columns and the ranking has to be boosted based on 2 columns. One column being straight forward i.e. boost based on the term itself and the other column needs to boosted based on the number of times a term occurs in the column. I have already set the weight in terms of Context as 1 and 2. I am currently trying to create a custom ranking model. I see that there is a parameter Tf in BM25 calculation which refers to the number times a term occuring in a field, but that doesnt seem to reflect in the results.

    For Example:

    Column 1, Column 2, Column 3
    Item 1, A, "SharePoint SharePoint SharePoint"
    Item 2, A, "SharePoint SharePoint SharePoint SharePoint SharePoint SharePoint SharePoint"
    Item 3, B, "SharePoint SharePoint SharePoint"
    Item 4, B, "SharePoint"
    Item 5, C, "SharePoint SharePoint"

    Considering Column 2 and 3 for the ranking:

    Expected Result: (Item 2 comes ahead of Item 1 as it has SharePoint occurences more number of times)

    Item 2, A, "SharePoint SharePoint SharePoint SharePoint SharePoint SharePoint SharePoint"
    Item 1, A, "SharePoint SharePoint SharePoint"
    Item 3, B, "SharePoint SharePoint SharePoint"
    Item 4, B, "SharePoint"
    Item 5, C, "SharePoint SharePoint"

    Please suggest

    ReplyDelete
    Replies
    1. BM25F is not that easy...a common term for example will almost be weighted away, and also the number of total terms in an item has effect.

      If you can modify the query I would use FQL and count and xrank to accomplish this instead of trying to get the model to fix it for you https://msdn.microsoft.com/en-us/library/office/ff394606.aspx#fql_count_operator

      Delete
  4. Thanks MiKael for the inputs...I tried implementing the solution using XRank with Count:
    xrank(xrank(xrank(sharepoint*, count("sharepoint", from=1), cb=100), count("sharepoint", from=15), cb=150),title:sharepoint,cb=300), but this would lead to having multiple ranges of count with XRank. Please let me know if I am heading in the right direction.

    Also, I have to extend the same solution for a similar requirement for OOTB People Results (User Profile). As we cannot use FQL for this, is there a possibility to achieve this using KQL? Please suggest

    ReplyDelete
    Replies
    1. Hi, that works :) For people search I would "hack" in the FQL using the refiner parameters which I have mentioned in another post. Refinement filters are actually pure FQL. KQL does not support the count operator.

      Delete
  5. Thanks Mikael for the inputs, I have implemented the solution using XRANK and it works. The query is a little bit long with multiple ranges to handle all cases and Title field Rank as well. In the meanwhile, I tried influencing BM25 but that really gets complicated.

    Thanks a lot again..:)

    ReplyDelete
  6. Hi Mike...I am back, stuck with the final step of this implementation, everything works fine with refinement filters for API calls, but I am unable manipulate the querystring of the OOTB SharePoint People Results (peopleresults.aspx) to get the results ranked. Please advise on how do I pass the refinementfilters with XRank and Count in the querystring of peopleresults.aspx?k=test.....

    ReplyDelete
    Replies
    1. Can't you do xrank in the query template instead? A lot easier.

      Delete
    2. Hi Mike...I can try with query template, but there are quite a few managed properties based on which the people search has been extended and query manipulation is done using querystring. So, wanted to check if there is a possibility to handle this as well in querystring...

      Delete
    3. Pass it as a custom query param, and pull that into your query template. http://www.techmikael.com/2016/05/appending-query-terms-to-search-url.html

      Delete
    4. Hi Mike...I am unable to handle xrank with Count in this approach using Query Template, i.e. this would not give me an option to pass refinementfilters

      Delete
    5. Right, you want count. Then you have to pass it as a refinement in the json object encoded in the url when adding a refiner.

      Delete
    6. Which would probably only work on a property filter, not one starting with a keyword like xrank. This is due to the way the search web parts work in the .js defined functions it uses.

      Delete
    7. Thanks Mikael for the inputs...

      Delete
    8. Thanks Mikael for the inputs...

      Delete
  7. Hi Mikael,

    I get access denied, with whatever account I use (even the tennant admin).

    Any ideas ?

    Another stupid thing is that if I search for sharepoint the results show everything as there is sharepoint in the URL. Worse, even documents with sharepoint in the title end up somewhere on the bottom

    ReplyDelete
    Replies
    1. Hi,
      I have tenant admin user which is also SP admin, and it works just fine.

      Delete
    2. As for the URL, that's hard, so if you want to search for sharepoint it should be a property query perhaps if it's tagged somewhere. One of the special cases we just have to live with I guess :(

      Delete