Wednesday, January 30, 2013

Continuous crawl - What is it, and what is it not

There's a lot of confusion around the new "Continuous crawl" mode in SharePoint 2013. It took me a while to decipher what it was myself, and reading the documentation on TechNet is not too helpful.

Let's break it down!

Continuous crawl is

  • only for SharePoint content
  • running non-blocking incremental crawls at 15 minute intervals (can be changed using PowerShell)
Continuous crawl is not
  • event based push indexing
If you run scheduled incremental/full crawls as in 2010, then each crawl is blocking. This means that if a crawl run takes longer than the interval set, then the next crawl will have to wait until the running one finishes.

When you enable continuous crawl, a new incremental crawl will start regardless of any running crawls (it will stil obey crawler impact rules).

The best example to illustrate the advantage of continuous crawls is if you start a full crawl of lots and lots of content which takes weeks to complete. During those weeks, all new content changes will be backed up until the running crawl completes. Using continuous crawl mode, it will still take weeks to process all the initial content, but any change happening during indexing will be picked up by new incremental crawls.

Result: New content is made searchable very fast regardless of other long crawls!

An excellent in-depth writeup on the topic can be found at the SharePoint IT Pro Blog, and is worth the read.