The crawl process goes something like this:
1. Full crawl started
2. Start address moved to queue
3. Protocol determined
4.Connector selected
5. iFilter opens files
6. Content index created on crawl server
7. Index moved in batches to query server
8. Data written to Crawl and Property databases
(Microsoft, 2010)
One myth about SharePoint 2010 crawl process is that when content is crawled it never touches the file system of the Crawl Server and goes directly to Query Server...this isn't true. The way it works in environments that have crawler and query roles on separate servers is; a content index is built on the crawl server initially, but is moved in batches to the query server(s). The point when all batches have propogated to the query server, then it may appear as if there is no foot print on the Crawl server's file system...this is because all the batches have been moved up.
From a server topology perspective, Search requires three different server roles. These include Crawl Server, Query Server, and Database Server. Each of these roles plays a role in the crawl process. For scalability and availability, the architecture supports configurations that include one or many of each of the server roles. Determining the best topology is a balancing act of capacity requirements with available resources.
Reference
Microsoft (2010). Microsoft SharePoint 2010 Product Information Capabilities Search. Retrieved September 22, 2010 from
http://sharepoint.microsoft.com/en-us/product/capabilities/search/Pages/default.aspx.
No comments:
Post a Comment