Thursday, June 28, 2012

How-to: Get individual items from a delimited field using FS4SP

Thought I’d share a small useful program I often use when indexing data into FAST Search for SharePoint. The issue is that you get data back from some system where the values are concatenated into one field. For example:

user1,user2,user3

But you want to treat each value as a separate one. In FS4SP you have to delimit the values with the unicode character U+2029. The task is simple; replace the exiting delimiter character with this special one.
If you get the data via BCS you might can change the SQL for a view to do this directly(haven’t tested), or you can employ a custom extensibility program. If doing the latter here’s the code:
using System;
using System.Xml;

namespace mAdcOW.MultiValue
{
  class Program
  {
    static int Main(string[] args)
    {
      try
      {
        XmlDocument doc = new XmlDocument();
        doc.Load(args[0]);
        foreach (XmlNode node in doc.SelectNodes("Document/CrawledProperty[@varType='31']"))
        {
          // here you can add replacement for other characters
          // as well besides comma
          node.InnerText = node.InnerText.Replace(',', '\u2029');
        }
        doc.Save(args[1]);

      }
      catch (Exception e)
      {
        Console.WriteLine("Failed: " + e.Message + "/" + e.StackTrace);
        return 1;
      }
      return 0;
    }
  }
}


This code snippet will read in a crawled property of type text(31) and replace any commas with U-2029, and write the value back to the same crawled property. To register the pipeline module you can use the following xml, where the input and output crawled properties are the same. The sample configuration works on <meta> tags in html pages named administrators/owners/members.

<PipelineExtensibility>
  <Run command="C:\FASTSearch\pipelinemodules\mAdcOW.MultiValue.exe %(input)s %(output)s">
    <Input>
      <CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="31" propertyName="ADMINISTRATORS"/>
      <CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="31" propertyName="OWNERS"/>
      <CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="31" propertyName="MEMBERS"/>
    </Input>
    <Output>
      <CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="31" propertyName="ADMINISTRATORS"/>
      <CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="31" propertyName="OWNERS"/>
      <CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="31" propertyName="MEMBERS"/>
    </Output>
  </Run>
</PipelineExtensibility>

And remember to enable your managed property for multi-value support. Using PowerShell you can use the following commands:

$emp = Get-FASTSearchMetadataManagedProperty -Name YourManagedProperty
Set-FASTSearchMetadataManagedProperty -ManagedProperty $emp -MergeCrawledProperties 1