Showing posts with label crawl. Show all posts
Showing posts with label crawl. Show all posts

Monday, December 15, 2014

Irregular gathering of crawled properties with SharePoint web crawls

To pick up data from <meta> tags when crawling web pages:

  • Make sure the crawled markup has line breaks after tags
  • Look in both Web and Document Parser crawled property categories for your crawled properties
  • Register the file extension crawled as the right mime type
  • Add the file extension crawled as the supported File Types

image

Monday, December 8, 2014

The right robots.txt settings for allowing SharePoint to crawl your site

If you want you want to allow SharePoint 2010 or 2013 to crawl your web site add the following to your robots.txt file.

User-agent: MS Search 6.0 Robot
Disallow:

Even though the crawler sends Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot) as the user agent string, this is not what you should check against. Logical…. nah, but it is what it is.

Total cost to figure this out: 6h Sad smile

Reference: The SharePoint Server crawler ignored directives in Robots.txt