Thursday, September 4, 2014

Debug crawled properties and creating a refiner for Content Source without modifying the existing ContentSource managed property

imageThat was a really long post title, but it explains what this post is about, so just look past the dryness of it.

The first part of the post explains how you can see what crawled properties are available for an item, and the second part is how you can use one of the RefinableStringXX managed properties as a refiner for content source. (Content sources is what you set up in your Search Service Application which points at what sources you want to index)

Say you have set up multiple content sources in SharePoint. One for SharePoint, a couple for file servers and maybe a web crawl or three of some web sites. Let’s assume all these sources have a mental footprint in the mind of your information works, and as such you want to show a refiner on the search page based on these content sources.

Out of the box SharePoint 2013 have a managed property named ContentSource which stores the name of the content source, but this property is not refinable, only queryable.
image
The easiest approach would of course be to change ContentSource to be refinable, but if you have read my post Mikael’s best practice for managing Managed Properties in SharePoint 2013/SPO–and how to deal with dates, you know that I like to leave the existing properties untouched, and I could have skipped this post couldn’t I?

So, back to the first task at hand: debugging of crawled properties. I assume you have worked a bit with search and know the relationship between crawled and managed properties, so I’ll skip that part.

Normally if I wanted to map a crawled property I would search for it in the UI or PowerShell and look for a name which looked like what I was hunting for. In the case of ContentSource which I planned to just check the UI for the mapping it uses, but as you see from the image above, ContentSource doesn’t have any mappings. At least none which we’re aware of, meaning it all happens automagically in the indexing pipeline (CTS). If that was a big leap to make, readup on how indexing works at your local TechNet pages :-)

How do you then go about finding out what crawled property is used for the managed property ContentSource?  I remembered reading a good tip from Brent Groom at Microsoft about how to debug crawled properties in 2013, which I’ll share with you.

First using PowerShell set the ULS log level for search to VerboseEx.
Set-SPLogLevel -TraceSeverity VerboseEx -Identity *Search:*

Next up, make a change to the item you are interested in and trigger an incremental crawl for this item to be picked up. This will now generate a lot of extra information during crawling into the ULS logs.

Once the crawl has finished it’s time to filter for EventID’s starting with af7 and you will find what you are looking for. You can do this using ULSViewer or via PowerShell. If using ULSViewer you could set up filtering before the incremental crawl to capture it runtime. If you’re using PowerShell use some variation of the command below, and I’ve included some of the output.

Get-SPLogEvent | Where-Object {$_.EventID -like 'af7*' -and $_.Level -eq 'VerboseEx'} | select Message | fl

Message : CTSDocument: FeedingDocument: properties : strDocID = ssic://19824 key = 
          00020329-0000-0000-C000-000000000046:DAV:iscollection values = True

Message : CTSDocument: FeedingDocument: properties : strDocID = ssic://19824 key = 
          012357BD-1113-171D-1F25-292BB0B0B0B0:#303 values = 3

Message : CTSDocument: FeedingDocument: properties : strDocID = ssic://19824 key = 00020329-0000-00
          00-C000-000000000046:urn:schemas.microsoft.com:sharepoint:portal:isdocument values = 0

Message : CTSDocument: FeedingDocument: properties : strDocID = ssic://19824 key = 0B63E350-9CCC-11
          D0-BCDB-00805FCCCE04:urn:schemas.microsoft.com:fulltextqueryinfo:displaytitle values = 
          Best Bets Management

Message : CTSDocument: FeedingDocument: properties : strDocID = ssic://19824 key = 
          012357BD-1113-171D-1F25-292BB0B0B0B0:#315 values = Best Bets

Message : CTSDocument: FeedingDocument: properties : strDocID = ssic://19824 key = 
          012357BD-1113-171D-1F25-292BB0B0B0B0:#662 values = 19

Message : CTSDocument: FeedingDocument: properties : strDocID = ssic://19824 key = 
          012357BD-1113-171D-1F25-292BB0B0B0B0:#664 values = b97f2443-b783-4427-acce-12c540938c71

Each of the FeedingDocument lines shows a crawled property key and value. In my case I knew the content source set up was named Best Bets, so that is what I searched for.

A match for my query was 012357BD-1113-171D-1F25-292BB0B0B0B0:#315. The first part is a property set GUID, and the last part #315 means the name is 315 and that the crawled property has the isNameEnum set to true (the hash). So, how do I know this? Because I back in 2011 answered a forum post about how to get hold of the crawled property for content source using FS4SP, and that somehow rang a bell when I noticed the 315 part.

The property set GUID about is tied to the crawled property category named Internal, which is where we should find the crawled property we are looking for. Let’s take a look at the crawled properties available in the Internal group using PowerShell.

$ssa = Get-SPEnterpriseSearchServiceApplication
$cat = Get-SPEnterpriseSearchMetadataCategory -SearchApplication $ssa -Identity Internal
$cat.GetAllCrawledProperties()


Name               : 105
CategoryName       : Internal
Propset            : 012357bd-1113-171d-1f25-292bb0b0b0b0
IsMappedToContents : False
VariantType        : 0

Name               : 107
CategoryName       : Internal
Propset            : 012357bd-1113-171d-1f25-292bb0b0b0b0
IsMappedToContents : False
VariantType        : 0

Name               : 3
CategoryName       : Internal
Propset            : 012357bd-1113-171d-1f25-292bb0b0b0b0
IsMappedToContents : False
VariantType        : 0

I cannot see a 315 property there, so clearly it’s been hidden. But it’s hard to map a crawled property to a managed property if it’s not there. Actually impossible.Turns out you can create a crawled property named 315, which is now available to use in your mappings. I have sort of made a hidden crawled property visible to the rest of the search eco system outside the content processing pipeline.

# Create a cp named 315 which has isNameEnum set to true
$cp = $cat.CreateCrawledProperty("315", $true, [Guid]"012357bd-1113-171d-1f25-292bb0b0b0b0")

image

Once it’s all set up, time to kick off a full crawl of all my sources and I have a workable refiner for content sources without changing the default one. See the first image in this post for a small preview where you see two sources shown, Best Bets and NPS Deviation.

Summary

Not all crawled properties are visible in SharePoint 2013, but by cranking up the log level for search we get pretty good debug output which can help to understand the data available for an item, and then map it to managed properties when needed.