Wednesday, July 9, 2014

July 2014 CU for SharePoint 2013 released–with a search feature long lost from 2010

You may now install custom PDF iFilters again, like the one from Foxit or Adobe! In SharePoint 2010 you could install any iFilter you wanted for text extraction and the most common to install was for PDF documents. With SharePoint 2013 this was replaced by an internal file handler instead, which could not be overridden.

With the July 2014 CU for SharePoint 2013, you can yet again install your custom PDF iFilter, or override any built-in handler to use the iFilter of your choice.
If you previously ran the command Get-SPEnterpriseSearchFileFormat -SearchApplication $ssa -Identity pdf. You would see the following output.
Identity   : pdf
Name       : PDF
MimeType   : application/pdf
Extension  : .pdf
BuiltIn    : True
Enabled    : True

After applying the July 2014 CU you get this instead

Identity   : pdf
Name       : PDF
MimeType   : application/pdf
Extension  : .pdf
BuiltIn    : True
Enabled    : True
UseIFilter : False

Notice the last line which says UseIFilter. In order to turn on iFilter support for PDF processing, use the commandlet  Set-SPEnterpriseSearchFileFormatState which has an added switch –UseIFilter for this purpose.

The full command to switch to iFilter for PDF is listed below, and you can use a similar command for any iFilter you want to replace – but remember to install the iFilter properly first.

Set-SPEnterpriseSearchFileFormatState -SearchApplication $ssa -Identity pdf -UseIFilter $true -Enable $true


References:

SharePoint Server 2013 July 2014 CU Download
SharePoint Foundation 2013 July 2014 CU Download
Implement a custom iFilter in SharePoint 2013


Thanks to Neil Hodgkinson for informing me of this file format change :)

10 comments:

  1. Mikael,

    Have you confirmed this actually works? I have followed the outlined steps but PDF documents are still using the OOB iFilter not the installed one that Windows Search is using. I am specifically trying to utilize the ABBYY Recognition Server iFilter to OCR PDFs as part of the indexing process. Any assistance to test and/or confirm that this is actually utilizing the settings is appreciated. I am leery that this may just be a precursor to the functionality coming in upcoming release. Thank you.

    Ed

    ReplyDelete
    Replies
    1. Hi,
      I have done the changes, then I crippled c:\windows\system32\glcndFilter.dll which is the default PDF ifilter in 2012 server, and this caused an error in the crawl log - indicating it is indeed using the ifilter and not the built in handler (Processing this item failed because of a IFilter parser error)

      Delete
  2. Hi Mikael,

    In SharePoint 2013 Search, we have created a content source(type : Line of Business Data) which gets data from a selected external data source. During crawling of this content source, I have a peculiar warning getting popped up in crawl log when the crawler tries to crawl a document (be it .docx/.xlsx/.pptx) whenever it consists another document as a nested document. And I see that none of the content within this document is getting crawled because of this.

    i.e. document B is embedded as an object inside document A, when crawler crawls document A, none of its content is getting indexed. I came to this conclusion as I was not able to get this document as result for any text search done in search center.

    Error :

    This item comprises multiple parts and/or may have attachments. Not all of these parts were indexed. They may either be invalid or deliberately skipped (e.g. images). The remote server may also have been unresponsive while indexing these parts. ( Error parsing document 'file://birdey001/gthrsvc_fa976e24-b67f-443f-a7e5-1adcb913fdbd-crawl-1//42/0x223342_10000.docx'. Error loading IFilter for extension '.zip' (Error code is 0x80CB4204). The function encountered an unknown error.; ; SearchID = A7F9EA34-6B5E-4BF4-AE5B-66746E80CA94 )

    I have tried a bit to identify what could have gone wrong but was not successful. Kindly let me know if you have ever faced this type of warning and any guess what might be the root cause for this.

    Thanks & Regards,
    Raju

    ReplyDelete
    Replies
    1. Not sure, but you could try to turn off the default processor and use iFilter instead and see if that helps. If you are on a 2012 server, it's merely setting builtin=false for the formats you want and see how a re-crawl goes. I would also try to upload the same document to a SharePoint library and see if you get the same error or not.

      Delete
    2. Hi Mikael,

      Thanks for the reply.

      We were reading the stream from database using a streamaccessor method where we did a mistake in the way we read it. We identified it and rectified the same. We mistook it as if it is caused by embedded documents.

      Delete
    3. Raju,

      I have a custom connector reading a database for the files just like you described. I'm also getting the "zip" error on the OpenXML Office documents. Can you shed some light on what your read issue was? I have been banging my head on this.

      Delete
  3. Hi Mikael,
    We want to perform OCR pdf search in SharePoint 2013,which are the iFilters that are available?

    ReplyDelete
  4. Now that SP1 is installed we are able to install iFilters again. They do not seem to perform any OCR as they used to in 2010. Do you know if there's a way to have SharePoint 2013 do OCR on image PDFs ?

    ReplyDelete
    Replies
    1. The ifilters should do the same job as before I believe...which filter are you looking at?

      Delete