Tuesday, October 05, 2010

SharePoint 4.0: How Search Indexing Works

The crawl process is essential to the Indexing side of SharePoint 2010 Search. Exploring the Search Service Application screens provides some insight to the configurable components that make up Search in SharePoint 2010, but you may actually have to dive deep into the system documentation and other resources to gain an understanding for the steps that occur within the subsystems, when SharePoint is instructed to crawl content.

The crawl process goes something like this:
1. Full crawl started
2. Start address moved to queue
3. Protocol determined
4.Connector selected
5. iFilter opens files
6. Content index created on crawl server
7. Index moved in batches to query server
8. Data written to Crawl and Property databases
(Microsoft, 2010)

One myth about SharePoint 2010 crawl process is that when content is crawled it never touches the file system of the Crawl Server and goes directly to Query Server...this isn't true. The way it works in environments that have crawler and query roles on separate servers is; a content index is built on the crawl server initially, but is moved in batches to the query server(s). The point when all batches have propogated to the query server, then it may appear as if there is no foot print on the Crawl server's file system...this is because all the batches have been moved up.

From a server topology perspective, Search requires three different server roles. These include Crawl Server, Query Server, and Database Server. Each of these roles plays a role in the crawl process. For scalability and availability, the architecture supports configurations that include one or many of each of the server roles. Determining the best topology is a balancing act of capacity requirements with available resources.


Microsoft (2010). Microsoft SharePoint 2010 Product Information Capabilities Search. Retrieved September 22, 2010 from

No comments:

Blog Archive