Thursday, February 5, 2015

Entity extraction in SharePoint based on the path managed property..

image
……is not possible, so a work-around is needed in order to accomplish this.
As background information I’m indexing a file server with a structure like below:
Below each of these end points there can be any number of sub folders like \\server\share\HR\john\sickleaves\test.docx. The clue is that I want a refiner for HR/F/MKT which should read:
  • Human Resource
  • Finance
  • Marketing
By creating a CSV dictionary you can use the managed property word extraction option in SharePoint to look for a term and replace it with another. The documentation for how to create and upload a dictionary can be found at https://technet.microsoft.com/en-us/library/jj219480(v=office.15).aspx
Note that this feature is not available for SharePoint Online even though the check boxes are available to you.

So back to the original issue. If you check of to use a dictionary for the path managed property (or originalpath or sitepath) it just won’t work. Why, I don’t know, but there is a workaround.
You can create a new managed property and map the same crawled properties to it as you find on the path managed property. Below I have created one called NoRecall which has no specific features set as I only want to use it as an extraction point.

image
  • Basic 11 - b725f130-47ef-101a-a5f1-02608c9eebac
  • Basic 9 - 49691c90-7e17-101a-a91c-08002b2ecda9
  • Web 2 - 70eb7a10-55d9-11cf-b75b-00aa0051fe20
The guid’s listed are the property set id’s for the crawled properties. These are needed as there is multiple cp’s named in the Basic category under different guid’s. If you do the mapping using the UI, include all to ensure you get the right one.

After the cp/mp mapping, uploading of the CSV file, checking off that you want Word Extraction on the NoRecall managed property and a re-crawl, you will see the expanded values appearing in the managed property WordCustomRefiner1. You might want to add an alias to this property as well to more easily reference it in your search solution.

My CSV file for the case above looks like this:

Key,Display form
server\share\HR\,Human Resources
server\share\F\,Finance
server\share\MKT\,Marketing

In order to not get false positives include as much of the path as possible.

To sum it up:
  • Create a new managed property (or use a reusable one)
  • Replicate the cp mappings of the path mp
  • Upload a dictionary to associate with the word extraction
  • Check the setting to use word extraction on the managed property
  • Re-crawl

13 comments:

  1. Hi Mikael, great post! This would be the solution of my problem (or at least I thought so), trying to map a custom entity extraction to one of the managed properties for path.

    But unfortunately it doesn't work.

    I created a new managed Property CustomPath exact in the way you described and a dictionary that looks like:

    Key,Display form
    file://com1924915/Fileshare for Search/Case Studies/,Case Studies
    file://com1924915/Fileshare for Search/SharePoint Tutorials,SharePoint Tutorials
    file://com1924915/Fileshare for Search/Technische Diagramme für SharePoint 2013,Technische Diagramme für SharePoint 2013
    file://com1924915/Fileshare for Search/Unterhaltsames,Unterhaltsames

    (This is what the SharePoint 2013 Search Query Tool from Codeplex shows for managed property path.)

    I tried using both WordPart and Word, with the same result: no errors, but no refiners to show up.

    Do you have an idea what is wrong there? Any hints would be highly appreciated!

    Best regards, Dorrit

    ReplyDelete
  2. Magic in its best way! Just after sendig the comment I started one very last try (using WordPart and WordPartCustomRefiner4 (which I didn't use before)) and this time it worked! :-)

    ReplyDelete
    Replies
    1. Sorry for the late reply...but great :) Seems I stopped getting alerts about comments for moderation.

      Delete
  3. Hi Mikael
    Have tried the above procedure but its not working for me ....

    ReplyDelete
    Replies
    1. Did you recrawl, and did try both word and wordpart custom refiner properties?

      Delete
    2. yes have tried with both the word and wordpart.....Is there any issue in importing more than 1 dictionary???
      in my case have created a new managed property and the values are coming into it but the values are not coming into CustomWordPartRefiner...

      Delete
    3. One dictionary per custom extractor works just fine. Not sure what is going on for you. You could try a simpler dictionary with single words to see if that matches.

      Delete
  4. For single word its working fine but for path related mapping its not working.
    My Dictionary consists of mapping like:
    http://isvpoc98/sites/SearchTeamSite/,CustomEntityContentSource
    http://isvpoc98:8080/sites/ZammerTeamSite/,ZammerContentSource
    Do I need to put the URL in Double Quotes????

    ReplyDelete
    Replies
    1. Try removing the http:// as part of the dictionary keys.

      Delete
    2. No, its still not working
      Have tried all permutation and combinations ....
      I dont know why this custom entity extraction is not working for path ..

      Delete
    3. Once done with my vacation I might try this with a http url..I only tested with file server ones.

      Delete
  5. Key,Display form
    netappcifshq.simon.com\credit files test\A,A
    netappcifshq.simon.com\credit files test\B,B
    netappcifshq.simon.com\credit files test\C,C

    I tried like this
    and use word.2 and wordpart.2
    but its not work

    ReplyDelete
    Replies
    1. Could you be more specific as to what content you are seeing in WordCustomRefiner1?

      Delete