Wednesday, January 30, 2013

Continuous crawl - What is it, and what is it not

There's a lot of confusion around the new "Continuous crawl" mode in SharePoint 2013. It took me a while to decipher what it was myself, and reading the documentation on TechNet is not too helpful.

Let's break it down!

Continuous crawl is

  • only for SharePoint content
  • running non-blocking incremental crawls at 15 minute intervals (can be changed using PowerShell)
Continuous crawl is not
  • event based push indexing
If you run scheduled incremental/full crawls as in 2010, then each crawl is blocking. This means that if a crawl run takes longer than the interval set, then the next crawl will have to wait until the running one finishes.

When you enable continuous crawl, a new incremental crawl will start regardless of any running crawls (it will stil obey crawler impact rules).

The best example to illustrate the advantage of continuous crawls is if you start a full crawl of lots and lots of content which takes weeks to complete. During those weeks, all new content changes will be backed up until the running crawl completes. Using continuous crawl mode, it will still take weeks to process all the initial content, but any change happening during indexing will be picked up by new incremental crawls.

Result: New content is made searchable very fast regardless of other long crawls!

An excellent in-depth writeup on the topic can be found at the SharePoint IT Pro Blog, and is worth the read.

4 comments:

  1. What should be if we call CrawlLog.RecrawlDocument or mark document as `Recrawl this document in the next crawl`. Should continuous crawl pick and crawl this item or not?

    ReplyDelete
  2. Is possible to add item to recrawl in the next continuous crawl execution?

    ReplyDelete
    Replies
    1. Not individual items, like via the crawl log option. They won't get picked until the next incremental (default every 4h when running continous). But if you can mark the library/list to be recrawled, which is picked up by continous.

      Delete
  3. We have a custom connector that pointing to custom Service. Is there any better way to handle crawl errors if Service is down and/or notify us if the crawl does fail and service seems to be down?

    ReplyDelete