Tuesday, March 16, 2010

SharePoint Search Crawl Rules - What is that, what it does for us?

Search Crawl rules are mechanism to influencing the behavior of the crawler when it crawls specific sites. A single crawl rule is created by specifying a URL wildcard matching sites plus a set of options for setting the behavior of the crawler for these sites.

When performing a search in SharePoint you often find you get noisy results where not only will it return the document you searched for, but it will also return the view and edit properties pages, the AllItems.aspx view form etc.

To prevent these from being returned you need to update the crawl rules. To do this follow the steps below:

For example: if you like all the document views and properties page to be excluded, you can use achieve it by configuring the crawl rule: *://*/Forms/*.aspx

1. Go to the crawl rules section of the search setting in the SSP

2. Add crawls rules to exclude the following path:
SharePoint Search Crawl Rules

Add crawl rules to exclude the following paths:
*://*webfldr.aspx*
*://*my-sub.aspx*
*://*mod-view.aspx*
*://*allitems.aspx*
*://*all forms.aspx*

You can also test a specific URL against the crawl rules to determine whether the rules will include or exclude the URL during a crawl.

In SharePoint 2007, wildcard operator “*” is the only operator supported in crawl rules foe matching everything. Because of its  nature that matches everything, it does not have the flexibility to, for example, recognize and omit URL that contain mobile phone number.

Technet Article: http://technet.microsoft.com/en-us/library/cc262934%28office.12%29.aspx

SharePoint 2007 Search Crawler not Crawling with Basic Authentication:
crawling won’t occur on the Site Collections using Basic Authentication. you will receive Error Message: "Access is denied. Check that the Default Content Access Account has access to this content, or add a crawl rule to crawl this content."

So, The solution is: Extend your Site Collections with Integrated Windows Authentication and set the extended Site Collections to be the default websites in Alternate Access Mapping.

Search Crawl Rules in SharePoint 2010:
In SharePoint 2010, Search Crawl Rules are set from Search Service Application. Go to Central Administration >> Manage Service Applications >> Search Service Application >> Crawl Rules
SharePoint 2010 Search Crawl Rules
Setting Search Crawl Rules with PowerShell:
Other than Search service application web interface, Search crawl rules can be set with PowerShell also.
#Get Search service Application
$SearchServiceApp = Get-SPenterpriseSearchServiceApplication "Search Service Application"

#Create Crawl Rules
New-SPEnterpriseSearchCrawlRule -SearchApplication $SearchServiceApp -Path "*://*allitems.aspx*" -CrawlAsHttp 1 -Type ExclusionRule
New-SPEnterpriseSearchCrawlRule -SearchApplication $SearchServiceApp -Path "*://*mod-view.aspx* " -CrawlAsHttp 1 -Type ExclusionRule
New-SPEnterpriseSearchCrawlRule -SearchApplication $SearchServiceApp -Path "*://*webfldr.aspx*" -CrawlAsHttp 1 -Type ExclusionRule
New-SPEnterpriseSearchCrawlRule -SearchApplication $SearchServiceApp -Path "*://*my-sub.aspx*" -CrawlAsHttp 1 -Type ExclusionRule

Technet reference: http://technet.microsoft.com/ko-kr/library/ff608119.aspx

SharePoint 2010 includes new capability in this area to support regular expression in the URL!



You might also like:
SharePoint Usage Reports
Usage reports, collaboration and audit for SharePoint.
Document SharePoint Farm
Automatically generate SharePoint documentation.
*Sponsored


Check out these SharePoint products:

1 comment :

  1. Thank you! The title is the clencher here! That is just what I was looking for.

    ReplyDelete

Please Login and comment to get your questions answered!

You might also like:

Related Posts Plugin for WordPress, Blogger...