Search Engine Structure
The Internet is a vast and overwhelming collection of information onany subject that can be imagined. To provide structure to this huge amountof information, search engines allow users to search for specific pieces of Search engines such as Google and Yahoo are technically known asinformation retrieval systems (IR) (Liddy, 2001). These search enginesthen work on the basis of created indexes. These indexes are matched withqueries entered by users. Indexes are created according to words indocuments and pointers within documents. The IR system creating this indexis structured according to four elements: a document processor, queryprocessor, search and matching function, and ranking ability (Liddy, 2001). The document processor comprises a preparing, processing andinputting function when a search is conducted (Liddy, 2001). Severalfunctions are inherent in this process, including normalizing the documentstream, breaking it into retrievable units, metatagging subdocument pieces,identifying indexable elements, etc. The first three functions are knownas pre-processing, and the main aim is standardization of multiple formats.
The network of the Internet, andthus the structure of the index created by Google, is displayed as a seriesof dots and lines. This makes searching the Internet itselfvery time consuming. The Google Set Vista again shows relationships between termsused by Google to create word sets (Sherman, 2003). The dots represent the different pages, while the linesgraphically represent the links between them. The number oflinks from pages relating to the subject matter on the page itself plays abigger role than in Google's general number of links system. While Google's organization is more or less random,according to a popularity related to general links, Teoma organizes inthree different ways. This allows the user a more informed choice when searching for specificinformation. There is however the option of a strong or weakstemming algorithm in order to regulate precision. The various structures of search enginesthen help to structure the vast amount of unstructured informationavailable on the Internet. Sherman (2003) introduces new technology developed by Google, theTouchGraph GoogleBrowser. These include words of little meaning to thecontent of the query, such as "and", "but", "of", etc. Furthermore WiseNut groups pages from the samesite according to the search result relevance to a greater extent thanGoogle does. These search several indexesprovided by other search engines for a more comprehensive result. The function of search engines is thus not to search the vast extentof the Internet itself. Closely related is termstemming, according to which suffixes are removed.
Common topics in this essay:
IR Liddy,
Furthermore WiseNut,
Web Centre,
URL's Sherman,
TouchGraph GoogleBrowser,
Project LookSmart,
Yahoo Gonzales,
,
Vista Yahoo,
Set Vista,
search engines,
search engine,
web centre,
web centre 2003,
centre 2003,
sherman 2003,
gonzales 2001,
liddy 2001,
document processor,
centre 2003 search,
particular page,
found internet,
information search engines,
information available internet,
directory project looksmart,
|