Tuesday, November 30, 2010

Increasing the summary length in FS4SP

In the settings for the Core Result Web Part you have the possibility to set the length of your hit summary. The default is 185 characters, and the upper limit seems to be somewhere around 400 when running against FAST Search Server 2010 for SharePoint.

As an example I have indexed a page on our company website. This is a paragraph from the original text which is 730 characters long.
FAST provides organizations with a scalable, high-performance enterprise search and information access platform designed to give instant access to information that is secure, relevant, accurate, and timely. With FAST and its Contextual Insight capability, it is possible to detect the context and intent of the query, search for terms and phrases, and return requested entities that appear in the context of the matching text. You will get both the contextual results with extreme precision and the contextual, dynamic navigation for further investigation of related information. Advanced linguistics and relevancy management features further improve and simplify your users’ search experience, enabling truly user-centric search.
A search on the term “FAST insight” yields this 160 character summary with the default settings:
By checking “Limit Characters In Summary” and setting the character count to 400
we get this output of 370 characters:
Certainly a much better read than the first one. But still the sentences are quite short, and fewer but more complete sentences would be better in my opinion.
The first step is to change the logic on how FAST generates summaries. Open up
in a text editor and add the following lines:
# Length of the generated summary in bytes. This is a hint to Juniper.
# The result may be slightly longer or shorter depending on the structure
# of the available document text and the submitted query.
juniper.dynsum.length 2048

# The number of (possibly partial) set of keywords matching the query
# to try to include in the summary. The larger this value compared is
# set relative to the length parameter, the more dense the keywords
# may appear in the summary.
juniper.dynsum.max_matches 3

# The maximal number of bytes of context to prepend and append to each
# of the selected query keyword hits. This parameter defines the max
# size a summary would become if there are few keyword hits (max_matches
# set low or document contained few matches of the keywords.
juniper.dynsum.surround_max 512

# The size of the sliding window used to determine if
# multiple query terms occur together. The larger the value, the more
# likely the system will find (and present in dynamic summary) complete
# matches containing all the search terms. The downside is a potential
# performance overhead of keeping candidates for matches longer during
# matching, and consequently updating more candidates that eventually
# gets thrown
juniper.matcher.winsize 600
This changes the default behavior on how summaries are generated. These parameters were found in the old FAST ESP, but has for some reason been left out in FS4SP. These values are all hints as to how the summary should be generated.
Apply the same settings to
Next execute the following commands from the FAST Powershell prompt
nctrl stop configserver search-1
nctrl start configserver search-1
Then we up the character limit in the web part to 2000 and we get this summary of 1500 characters:
If we apply our own summary logic and pick out the sentence with the most hits, we could end up with a summary of just the highlighted sentence, which gives more context than the original summaries. This logic could either be embedded in the XSLT (preferably via a callback to make the code cleaner) or you could override the web part and modify the summary before it’s being output to the XSLT.
(This post is cross-posted at http://nuggets.comperiosearch.com/2010/11/increasing-the-summary-length-in-fs4sp/)

No comments:

Post a Comment