Tech and me: crawl

Showing posts with label crawl. Show all posts

Monday, December 15, 2014

Irregular gathering of crawled properties with SharePoint web crawls

To pick up data from <meta> tags when crawling web pages:

Make sure the crawled markup has line breaks after tags
Look in both Web and Document Parser crawled property categories for your crawled properties
Register the file extension crawled as the right mime type
Add the file extension crawled as the supported File Types

The right robots.txt settings for allowing SharePoint to crawl your site

If you want you want to allow SharePoint 2010 or 2013 to crawl your web site add the following to your robots.txt file.

User-agent: MS Search 6.0 Robot
Disallow:

Even though the crawler sends Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot) as the user agent string, this is not what you should check against. Logical…. nah, but it is what it is.

Total cost to figure this out: 6h Sad smile

Reference: The SharePoint Server crawler ignored directives in Robots.txt

Tech and me

Monday, December 15, 2014

Irregular gathering of crawled properties with SharePoint web crawls

Monday, December 8, 2014

The right robots.txt settings for allowing SharePoint to crawl your site

About Me

Blog Archive

Rewrite image urls