Flag This Hub

Google :: Search Engine Google

By


Google :: Search Engine Google

As a search engine Google is a complete architecture for gathering web pages (crawling), indexing and performing search queries (searching) on those pages. Google Inc. is the company that was formed to offer the Google Search Engine to the web searching community after Sergey Brin and Larry Page developed it at Stanford University. The Google Search Engine is an easily-scalable large-scale web search engine which efficiently crawls and indexes web content and produces text and hyperlink databases which are then accessed to produce satisfying, relevant, contextual search results to answer user search queries.

The difference between the Google Search Engine and other search engines was that it additionally utilised hypertext structures to determine quality rankings for each web content unit. This strategy allowed the Google Search Engine to formulate and present better search results than had previously been available from contemporary rival search engines. There were problems, not least of which was the need to deal effectively with uncontrolled hypertext collections where the adage 'anyone can publish anything' was nowhere more true. Throughout the existence of the web both information and users have grown rapidly. This inevitably means that there are at any time a plethora of users inexperienced in the art of web research. The Google Search Engine was developed with the philosophy that all users, whatever their experience level, should be able to retrieve relevant results for the query terms they use.

The search engine (Google) extracts distinct terms from web content such as words, phrases and non-contiguous word sequences and indexes them. This allows future queries to access the Google databases and determine those websites that have content most closely matching the search sequence entered. Once a list of relevant documents (docList), that contains all the search terms, has been created, various algorithms are employed to determine the order in which the top 1000 results will be presented back to the searcher. Thus the Google Search Engine produced improved search quality results through the application of relevance and quality filtering.

The Google Search Engine is not just the 'face' you see when you enter a search term on a Google homepage. A great deal of effort has gone into creating the databases that are accessed to return the relevant results that are required. The web has to be crawled (web content read and analysed) and indexed before relevant results can be available to be returned to be chosen for accessing and reading.

Google :: Search Engine Google Wordle

Google Search Engine Wordle by Humagaia
Google Search Engine Wordle by Humagaia

Google Search Engine Presentation by Humagaia

Google Search Engine Design Requirements

In order for the search engine from Google to match and surpass its rivals there were certain design criteria that had to be taken into account:

  • Fast crawling technology was required in order to gather web documents and to keep them up-to-date.

  • The storage space for storing indexes and documents needed to be used efficiently.

  • The indexing of terabytes of data required to be efficient.

  • Any query of the databases needed to be handled quickly (the most important aspect for the designers).

  • The data structures needed to be optimised for fast and efficient access.

As these design requirements were achieved the Google Search Engine became the dominant player in the search engine market surpassing their rivals very quickly.

How Search Works by Google's Matt Cutts

Google Search Engine Web Content Crawling

All search engines that support their own web content pointer databases have a need for a crawler or spider. These programs trawl through the internet to find and index web content. With the Google web crawler, a single URLServer passes lists of URL's (from those already crawled or from newly submitted URL's) to a number of Google crawlers or Googlebot's. In order to keep the time of access for each web content site to a minimum, each crawler maintains its own DNS cache and each crawler has numerous open connections. As each 'fetch' is performed a number of queues move the information fetched from state to state.

Overview Of How Search Engines Work

Search Engine And Web Crawler - Part 1

Search Engine And Web Crawlers Part 2

What Is Google PageRank?

Google Search Engine Web Content Indexing

Parsing (syntactic analysis of grammatical structure of text to determine relationships between words and to infer meaning) - every word encountered passes to a storeserver and is compressed and assigned a wordID using a regularly updated lexicon and parsed indexed into databases ('barrels'). Each word encountered in a document is converted to a set of word occurrences or hits (limited to a maximum total). The hits record the word position in documents, the font size and the capitalisation. The hits are categorised into:

  • Fancy hits – those that occur in the URL title, anchor text and / or meta tag. The information recorded for them is capitalisation, font size (set to 7) and position. Anchor hits have positional and docID information recorded for them.

  • Plain hits – everything else. These have capitalisation, relative font size and positional information recorded.

The hits are translated into a 'hit list' and distributed into forward barrels creating a partially sorted forward index. All links are parsed in every web page and important information that determines where the link points to and from, is stored in the anchor file.

It is important therefore that you ensure that the URL and Title have the targeted keywords in them and that they are capitalised. This goes for any anchor text that you create to point to one of your documents (in RSS feeds for instance). Additionally, emboldening, capitalising and increasing the font size of keywords and anchor text (where possible) will assist in raising your document up the rankings.

URL Resolver - this reads the anchors and converts the relative URL's to absolute URL's as docID's. The anchor text is passed to the forward index and populates the Google link database with pairs of docID's which are used to compute the PageRank for all documents.

Sorter – Each 'forward barrel' (index) is sorted by wordID to produce an inverted index for title, anchor hits and full document text (which is cached). An interim DumpLexicon is used to update the Lexicon.

It is the combination of lexicon, inverted index and PageRank that is used to answer a Google query.

Search, Google, and Life: Sergey Brin Lecture

Google Search Engine Web Content Searching

The Google Search Engine is focused on providing quality search results, efficiently. The order of events that take place for a Google search results list to be presented back to the searcher is as follows:

  1. Parse the query.

  2. Convert the search words into WordID's.

  3. Search by every word for title and anchor text links.

  4. Scan docLists that contain all search terms or, if not enough results, scan a subset of search terms.

  5. Compute Google PageRank for each document retrieved.

  6. Do 4. and 5. until sufficient results are obtained or no more docLists are available.

  7. Sort documents in Google PageRank order and show only top 1000 results.

How Do Search Engines Decide How Web Sites Rank?

Google Search Engine System Features

The Google Search Engine makes use of a citation (pointer from one web content site to another) importance link graph map of web hyperlinks to calculate rapidly an approximation of page importance and quality as a Google quality ranking (PageRank) which allows the prioritisation of keyword search results. Links (backlinks) are normalised as they are not counted as equal with an academic citation or one from another highly ranked web authority being given greater importance when applied to a given web content page.

Google PageRank – is a model of user behaviour and was defined as the probability that a random surfer would visit a certain page (the more links pointing to the page increases the probability that the surfer will find it). A damping factor is applied to a single page or group of pages and quantifies the probability that the random surfer will become bored with that particular page (the longer a surfer stays on the page the less the damping factor that will be applied). High PageRank is therefore obtained if a large number of pages link to the page or those that point to it have a high PageRank themselves.

Anchor Text – is the text used to describe the web content on another page to which there is a link. The Google Search Engine associates the link text to both the sender and receiver web pages. Often anchors provide better descriptions of the web content to which they are pointing than the web content pages themselves. An additional advantage is that anchor text can exist for non text-based documents such as images, programs, databases, videos etc and return results even where the content pointed to has not been crawled: thus giving even better quality search engine results for Google.

Location – the Google Search Engine records location and proximity information for all hits. This means that exact matches to search queries can be located as well as close approximations to the exact keyword search phrase. These are given a weighted importance so that those words that match the search query that are closest together in the text have a higher probability of causing a positive hit for the search phrase.

Characteristics of words – the citation (pointer from one web content site to another) link graph map of hypelinks also records certain characteristics against the wordID. These include the font size and emboldening of the text, where the larger and / or bolder font receives a higher weighting in comparison to the remainder of the web content text of the page. This means that (HubPage) headings will rank higher than general content for a particular search phrase and the emboldened text will rank higher than normal text but lower (usually) than title text.

External Meta Information – the Google Search Engine also takes into account information that can be inferred about a document but that is not contained within it, such as:

  • The reputation of the source – on HubPages, for instance, the reputation of both the HubPages site as well as the author.

  • The update frequency of the content of the document – if you use RSS feeds in your document, for instance, and create another hub in a series or a new hub for an author, then the page will be updated and increase the update frequency as far as the Google Search Engine is concerned.

  • The quality of the content – this could be determined by whether the content is bookmarked (through bookmark sites or by using the bookmark tab) and how long a surfer stays on the document: both of which can be recorded in the Google databases.

  • The popularity of the document – this by the number of reads and the duration of those reads.
  • The usage of the document, and
  • as above, the citations.

The Google Search Engine maintains much more information about web documents than typical search engines did. The type-weights and count-weights are incorporated into an IR score which together with Google PageRank and proximity scores allow the Google Search Engine to determine the order in which the most relevant query results will be presented back to the query results screen.

Google :: Search Engine Google

Google :: Search Engine Google
Google :: Search Engine Google

Google :: Search Engine Google Conclusion

This is a brief, simplified overview of the Google Search Engine and how it works. No-one knows all of the tweaks that occur or the nuances of its workings as these are Google trade secrets. If you were told these you would have to be shot!

See also:

How To Google - homepage of "Google How To".

How To Google in English - for the English version index to "Google How To" subjects.

Google Related Articles by Humagaia

  • English Google : webhp hl en

    English Google is the Google search home page for database searches in English. Google.com defaults to the English language search home page, if your local language is English. If your local language is not English then you will need to use the Google webhp URL extension to access the English language version. - 8 months ago

  • Google India: www google co in: Google IN India: Hindi, English

    Google India (www google co in) is the Google search page as produced for India. Google India homepage is a clone of the Classic Google homepage with localized Indian functionality. From Google India (www.google.co.in) you can undertake searches that have an Indian bias, with India Google in English or India Google in Hindi, Bengali, Telugu, Marathi, Tamil, Gujarati, Kannada, Malayalam and Panjabi. As with all other Google search pages, Google India search can be viewed in other languages. To ac - 15 months ago

  • Classic Google Homepage: Classic Google Home Page: Google Classic Homepage: Google Account Setting

    Classic Google Home Page: How do I get to the Classic Google Homepage? What Google Classic account setting is required?What browser Google address bar functions are available from Google Classic Homepages? What different Google Search facilities are available when I get to Google Classic Home Page? What specialised Google Search functions can I access from my Classic Google Home Page? What Google Advertising Programs can be accessed from my Classic Google Page? What Google Business Solutions are - 21 months ago

  • GoogleMaps :: How to Use Google Maps :: How GMaps Works

    How to use GoogleMaps, How to use Gmaps. GoogleMaps (GMaps) is map service plus! GoogleMaps has search for: GMaps Locations; GMaps Businesses; My GMaps; GMaps Real Estate. Google Satellite, Google Earth, GMaps Directions, GMaps Street Views and GMaps Traffic information. - 21 months ago

  • Google UK : www google co uk : Search, Webhp and UK Google Homepage (in English)

    Google UK (www google co uk) is the Google search page as produced for the United Kingdom. Google UK homepage is a clone of the Classic Google homepage with localized UK functionality. From Google United Kingdom (www google uk) you can undertake searches that have a UK bias, with Google UK in English. As with all other Google search pages, Google UK search can be viewed in languages other than English. To achieve this you can utilise the Google co uk webhp function. Another trick is to increase - 16 months ago

  • Australia Google AUS: Google Oz: Search, Webhp, English: www google com au

    Google Australia (www google com au) is the Google search page for Australia. Google Australia homepage is the Classic Google homepage localized for Australian requirements. Google Aus (www.google.com.au) allows you to make searches with an Australian bias, with AU Google in English. Google AUS search is presented in English. Manipulate the Google com au webhp function to show Google OZ in another language. Using Firefox Australia you can view just Australian sites. - 15 months ago

  • Netherlands Google Holland: Homepage, Search, Webhp: English, Dutch: Nederland: nl

    Google Netherlands (Google Nederland, www.google.nl) is the Google search engine for Holland. Here you can do keyword phrase, image, video, maps, news and book searches, localized for the Netherlands. - 13 months ago

  • Google Driving Directions :: GoogleMaps Directions Driving :: How Use Google Maps for Directions

    Google Driving Directions are in Google Maps. GoogleMaps Directions Driving from Get Directions or enter from – to search. Add destinations in GoogleMaps Driving Directions. Street View, alternative routes, drag driving directions, Google Earth driving directions. The directions are broken down into point-to-point sections and presented in tabular form below. You can choose from alternative calculated routes, if available. Google Maps calculates directions together with an estimated journey time - 21 months ago

Comments On This Google Search Engine Article Are Welcomed. If you liked it, please rate it above.

Pro Design Source 20 months ago

Wow, the amount of content here about how Google works is amazing. Really informative for anyone who wants to publish a website and have it rank in the Google search engine. I've bookmarked it to come back to read again. My brain doesn't work like the Google Search Engine so I will have to re-read this a few more times.

Dobson 21 months ago

Such a lot of information here. Will need more than one review to make sure all the information is retained.

Submit a Comment
Members and Guests

Sign in or sign up and post using a hubpages account.



    Latest Hubs by Humagaia

    • How to Create an App

      Phone apps are all the rage. But knowing how to create an app has been in the realm of the techie geek, and the cost to create an application has been astronomic. Not any more! There is now an alternative as to how to create an app - online software that not only creates the app, but also delivers it to the app store. There is also software that lets you design an app so you can undertake market research, before committing to creating your mobile app for iphone, ipad, ipod, android or smartphone - 3 months ago

    • Google Search Plus Your World : Insidesearch

      Googles "Search, plus Your World" was introduced to users of google.com on January 10th 2012 - which may prove to be the most significant date in world search history! With this new integrated product you can "Search the web, your photo's, friends' posts, and more.", so much more! - 4 months ago

    • Vitamin C : Supplements

      Vitamin C (ascorbic acid) is a very versatile nutrient. It is probably the most widely used nutritional supplement. It is necessary to make collagen (connective tissue of the body) and maintain mucous membranes. It is also an important antioxidant. Vitamin C supplements may help to reduce the severity and duration of colds. - 8 months ago

    • Zinc : Supplements, Mineral

      Zinc supplements come in many forms, often in combination with vitamin C. When buying capsules, tablets or liquids, choose zinc picolinate, zinc acetate, zinc citrate or zinc aspartate. All of these forms of zinc are easily absorbed and gentle on the stomach. For treating colds and flu, look for lozenges containing zinc gluconate, zinc ascorbate or zinc glycinate. - 8 months ago

    • Chasteberry Supplements

      Chasteberry herb is useful in the management of fluid retention. It is also used to alleviate hot flushes, anxiety, depression and the symptoms of PMS (pre-menstrual syndrome). Although it does not contain hormones, it does stimulate the production of female hormones. Chasteberry (Vitex agnuscastus) is also known as vitex, chaste tree berry and monk's pepper. - 8 months ago

    • Acne: Causes, Natural Remedies, Helpful Supplements : Vitamins, Minerals and Herb Extracts.

      Acne causes blackheads, whiteheads and reddish blemishes with hard centres to occur on the face etc. Eating whole grains, fresh fruit and vegetables, lean meat and drinking plenty of water can ease the codition, as can supplements of vitamins B-6, C and E, minerals zinc and selenium, and extract of chasteberry. - 8 months ago

    • Chinese Google : Traditional, Simplified : webhp hl zh

      Chinese Google is the Google home page for Chinese language searches. There is no Google China. Use the Google webhp URL extension to access either Google Chinese traditional or Google Chinese simplified. - 8 months ago

    • English Google : webhp hl en

      English Google is the Google search home page for database searches in English. Google.com defaults to the English language search home page, if your local language is English. If your local language is not English then you will need to use the Google webhp URL extension to access the English language version. - 8 months ago

    Like this Hub?
    Please wait working