Friday, November 20, 2015

Why Hybrid Crawl in SharePoint is a cold hot potato


[This rant is based on the preview of the Cloud Search Service Application]

Let me start by saying that I DO think hybrid crawl (or the Cloud Search Service Application) is a cool thing. It’s just the implementation of it which makes me go: Meh…..

When the Cloud SSA was first mentioned and further put into preview everyone was all YEAH!!!! WOW!!! AWESOMEPANTS!!!…..while I was quickly going… meh. What’s wrong with me?!?

To clue everyone in, hybrid crawl or the Cloud Search Service Application is a function where your on-premises SharePoint 2013 or 2016 farm can index local content (any source) and store it in the SharePoint Online search index instead of storing it in the local search index.

From a functional perspective hybrid crawl gives you the same indexing capabilities as SharePoint Online, which turns your on-premises crawl box into a content gatherer dummy. Without writing a custom connector or purchasing one from a 3rd party vendor, you can do no content enrichment or entity extraction on the indexed content. In my opinion a minimum requirement for an enterprise search engine.

Technically hybrid crawl is of course all very sound, one index to rule them all. You get the ability to tap into the Office Graph for your content – which certainly has potential, and you don’t have to manage heaps of on-premises search servers – something I know several companies look forward to.

That said, the existing hybrid search solution with federated result sources between SPO and on-premises still works for many hybrid scenarios as you might have chocolate covered pineapples on-premises and fried fish in SPO, so mixing the results might not taste all that good anyways. Having one index might not be all that important to you.

Take a stand

Based on the current implementation of hybrid crawl let’s do a quick check list. If you can answer NO to the questions below, Cloud SSA is the way to go for you once it’s in general availability:

  • Are you prohibited by local law to store data outside your country?
  • Are you currently using entity extraction (dictionary mapping)?
  • Are you currently using the content enrichment web service?
  • Are you planning to use the two former points in the future?
  • Do you need to use custom managed properties outside the ReusableXXYY ones? (except text and yes/no)
  • Do you have client licenses in SPO for all on-premises users?

If you managed to get through those six points with a NO, or you can turn them into a NO based on the benefits below, then you’re all set!

Hybrid crawl benefits

As I started with, hybrid crawl is cool, and it certainly has some benefits:

  • You only have one search index to maintain (or actually Microsoft maintains it for you)
  • Content is available in Delve via the Office Graph
  • Your IT guy have less SharePoint servers to maintain and patch – and he will probably thank you!
  • Your boss have less SharePoint server licenses to pay for – and he may thank you if TCO goes down


If you’re a simple person not bothering to tweak search and content all that much to make it really shine, then hybrid crawl and the Cloud SSA in the current edition is something for you.

If/when Microsoft decide to add a hook for at least entity extraction dictionaries, then I’m all YEAH!!!! WOW!!! AWESOMEPANTS!!! with the rest of you.


  1. Show stopper - index in the cloud is unencrypted.

    - Wise words Sahil

    1. Might very well be true and a valid point, but do we know this for a fact? And onprem is stored the same way, except you can encrypt the file system which protects raw access. Then again raw access in SPO seems kind of hard.

  2. Hi

    I read somewhere that if you use cross site publishing, then you "won't" be able to use hybrid search.

    1. I actually have no idea, but indeed something to check if that's a scenario.

  3. Sorry Mikael, I have couple more queries - well it is an interesting subject:

    1) Can users assign E1 licenses be able to use hybrid search? I assuming I have misread this as otherwise I would need implement both Hybrid and Federated search at the same time!
    2) The WAP is only really needed when you have remote workers who use VPN to access the company Office 365 tenant and the search results include links to content held in the on-premise farm.

    1. 1. E1 should cover on-prem rights, so I assume hybrid should be ok. Or check the doc I reference and see if hybrid is specified.
      2. WAP is needed if you federate results from on-prem index. If all items are in the cloud index you don't need WAP.

      And I'm by no means a hybrid setup expert :)

    2. Mikael, thanks for answer both questions. I will follow this up.

  4. Hi Mikael, I have a cloud SSA configured and I am crawling a non sharepoint site like a Wiki web site. I am unable to get the metadata like Author and other stuff. How Can I map the metadata of a non sharepoint site with my Managed properties? Thanks!

    1. Make sure they are I'm meta tags and they should be exposed as crawled properties.

    2. Thanks! Any articles from you that can explain me in detail? Because I am looking for something more like how do I extract more metadata from the non SP site and then make use of those on my Search page.

    3. Custom connector is what you need, if the connector does not give you the metadata you want.