|
The process of indexing offers to search engines collect their information and include the keywords of web pages into their database of search results from the internet. The process involves extracting the machine-readable text from web pages, and storing it in a format that can be efficiently and fast searched. Indexing is often carried out by search engine spiders.
Indexing the Web
Indexes of information on the WWW are used by searching services. So the two are inextricably connected
Other terms related to indexing: keywords catalog directory table-of-contents taxonomy hierarchy classify organize. There are many ways of building an index.
There are many kinds of indices.
Given an index, there are many ways to search through it.
Centralized Indexing
In a central index, everything that is searchable is indexed in one place. This makes searching easier, but it is unscalable to the whole web. Gathering the information for a centralized index is also a big challenge.
Distributed Indexing
These systems distribute the index and searching across several servers. One of the relationships between the servers is usually hierarchical, but this is not always the case. Hierarchical structures have difficulty near the top because the top server either has to contain all index information provided at lower levels or it does not have perfect information to guide the forwarding of searches to lower level alternative servers.
A non-hierarchical relationship between indexes is a challenge because no index has all the information. Forwarding queries may rely on knowing the relationship between indexes and knowing enough about the queries to follow the best linked index.
Alternatively, the query may be forwarded to all linked indices and the query continues on the path that provides the best match so far. This still requires each index to know at least some of the information in the neighboring indices so that a partial match may succeed. But there does not have to be any formalized relationship between indices.
|