Wednesday, March 2, 2011

Prototyping pipeline stages in PowerShell

The defacto way of creating a custom pipeline stage in FAST for SharePoint is to create an executable file which reads and writes an xml file. This usually implies having Visual Studio available and compiling and deploying a new file each time you make a change in order to test it in a proper pipeline.

To the rescue comes PowerShell. Since all FAST servers have PowerShell installed you can create a PowerShell script and use this. All you need is notepad Smile This gives the flexibility of trying out stuff without recompiling. But the cost is speed of execution. So you might want to port the code over to e.g. C# when you are done testing your code.
See my post “How To: Debug and log FAST Search pipeline extensibility stages in Visual Studio” on how to do this in C#, and also how to create your own property set with PowerShell commands.
In order to register the script in pipelineextensibility.xml we prepend the ps1 file with the PowerShell runtime. Make sure to keep the <PipelineExtensibility> section around your <Run>  section(s).
<Run command="C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe C:\FASTSearch\pipelinemodules\concept.ps1 %(input)s %(output)s">
<CrawledProperty propertySet="48385C54-CDFC-4E84-8117-C95B3CF8911C" varType="31" propertyName="docvector"/>
<CrawledProperty propertySet="fa585f53-2679-48d9-976d-9ce62e7e19b7" varType="31" propertyName="concepts"/>

In PowerShell you have the option to instantiate any .Net object, and for this script I use the XmlDocument class for xml creation out the result file. The script is pretty straight forward as it reads in an xml file. Selects a property called “docvector” and writes this out to a new field called “concepts”.  Typically you would do more work than just copy an attribute, but it shows the concept.

I’m not a PowerShell expert, so there might be better ways of doing some parts of this script. Make a note that I read and write the xml files with UTF-8 encoding. This is crucial as this is what the pipeline works with.

function CreateXml()
param ([string]$set, [string]$name, [int]$type, $value)

$resultXml = New-Object xml
$doc = $resultXml.CreateElement("Document")

$crawledProperty = $resultXml.CreateElement("CrawledProperty")
$propSet = $resultXml.CreateAttribute("propertySet")
$propSet.innerText = $set
$propName = $resultXml.CreateAttribute("propertyName")
$propName.innerText = $name
$varType = $resultXml.CreateAttribute("varType")
$varType.innerText = $type

$crawledProperty.Attributes.Append($propSet) > $null
$crawledProperty.Attributes.Append($propName) > $null
$crawledProperty.Attributes.Append($varType) > $null

$crawledProperty.innerText = $value

$doc.AppendChild($crawledProperty) > $null
$resultXml.AppendChild($doc) > $null
$xmlDecl = $resultXml.CreateXmlDeclaration("1.0", "UTF-8", "")
$el = $resultXml.psbase.DocumentElement
$resultXml.InsertBefore($xmlDecl, $el) > $null

return $resultXml

function DoWork()
param ([string]$inputFile, [string]$outputFile)    
$propertyGroupIn = "48385c54-cdfc-4e84-8117-c95b3cf8911c" # FAST internal group
$propertyNameIn = "docvector" # property name
$dataTypeIn = 31 # integer

$propertyGroupOut = "fa585f53-2679-48d9-976d-9ce62e7e19b7" # Custom group
$propertyNameOut = "concepts" # property name
$dataTypeOut = 31 # integer

$xmldata = [xml](Get-Content $inputFile -Encoding UTF8)
$node = $xmldata.Document.CrawledProperty | Where-Object {  $_.propertySet -eq $propertyGroupIn -and  $_.propertyName -eq $propertyNameIn -and $_.varType -eq $dataTypeIn }
$data = $node.innerText

# do your custom modification on $data here

$resultXml = CreateXml $propertyGroupOut $propertyNameOut $dataTypeOut $data
$resultXml.OuterXml | Out-File $outputFile -Encoding UTF8
# pass input and output file paths as arguments
DoWork $args[0] $args[1]