Friday, November 20, 2015

Why Hybrid Crawl in SharePoint is a cold hot potato

image

[This rant is based on the preview of the Cloud Search Service Application]

Let me start by saying that I DO think hybrid crawl (or the Cloud Search Service Application) is a cool thing. It’s just the implementation of it which makes me go: Meh…..

When the Cloud SSA was first mentioned and further put into preview everyone was all YEAH!!!! WOW!!! AWESOMEPANTS!!!…..while I was quickly going… meh. What’s wrong with me?!?

To clue everyone in, hybrid crawl or the Cloud Search Service Application is a function where your on-premises SharePoint 2013 or 2016 farm can index local content (any source) and store it in the SharePoint Online search index instead of storing it in the local search index.

From a functional perspective hybrid crawl gives you the same indexing capabilities as SharePoint Online, which turns your on-premises crawl box into a content gatherer dummy. Without writing a custom connector or purchasing one from a 3rd party vendor, you can do no content enrichment or entity extraction on the indexed content. In my opinion a minimum requirement for an enterprise search engine.

Technically hybrid crawl is of course all very sound, one index to rule them all. You get the ability to tap into the Office Graph for your content – which certainly has potential, and you don’t have to manage heaps of on-premises search servers – something I know several companies look forward to.

That said, the existing hybrid search solution with federated result sources between SPO and on-premises still works for many hybrid scenarios as you might have chocolate covered pineapples on-premises and fried fish in SPO, so mixing the results might not taste all that good anyways. Having one index might not be all that important to you.

Take a stand

Based on the current implementation of hybrid crawl let’s do a quick check list. If you can answer NO to the questions below, Cloud SSA is the way to go for you once it’s in general availability:

  • Are you prohibited by local law to store data outside your country?
  • Are you currently using entity extraction (dictionary mapping)?
  • Are you currently using the content enrichment web service?
  • Are you planning to use the two former points in the future?
  • Do you need to use custom managed properties outside the ReusableXXYY ones? (except text and yes/no)
  • Do you have client licenses in SPO for all on-premises users?

If you managed to get through those six points with a NO, or you can turn them into a NO based on the benefits below, then you’re all set!

Hybrid crawl benefits

As I started with, hybrid crawl is cool, and it certainly has some benefits:

  • You only have one search index to maintain (or actually Microsoft maintains it for you)
  • Content is available in Delve via the Office Graph
  • Your IT guy have less SharePoint servers to maintain and patch – and he will probably thank you!
  • Your boss have less SharePoint server licenses to pay for – and he may thank you if TCO goes down

Summary

If you’re a simple person not bothering to tweak search and content all that much to make it really shine, then hybrid crawl and the Cloud SSA in the current edition is something for you.

If/when Microsoft decide to add a hook for at least entity extraction dictionaries, then I’m all YEAH!!!! WOW!!! AWESOMEPANTS!!! with the rest of you.

2 comments:

  1. Show stopper - index in the cloud is unencrypted.

    - Wise words Sahil

    ReplyDelete
    Replies
    1. Might very well be true and a valid point, but do we know this for a fact? And onprem is stored the same way, except you can encrypt the file system which protects raw access. Then again raw access in SPO seems kind of hard.

      Delete