How Search Engines Work – A Primer

This post was written by Internet Marketing John on September 26, 2009
Posted Under: Internet Marketing Information

We all know how to use search engines, however, how many of you are familiar with how search engines work?

First of all, search engines are nothing more than specialty web sites that are specifically designed to help people retrieve information stored on other websites.

All of the search engines have differences in how they function, however, they all perform the same basic tasks for the public.

  • Internet Search

All search engines scour the Internet for information that is based on important “keywords” found in the request query made by the user.

  • Index Archive

All search engines archive an index of the keywords they find on the Internet, and the locations where the keywords were found.

  • Keyword Retrieval

All of the search engines allow users to access that index archive, to search for keywords, or keyword phrases they are looking for.

Initially, the first search engines only received a couple of thousand inquiries, or searches on a daily basis.
Back then, their indexes only held a few hundred thousand pages.

Now, top  engines such as Google, will index hundreds of millions, to billions of pages in their database.  Google also answers and responds to hundreds of millions of search queries, daily.

In order to locate and index the vast amount of information required to respond to every search engine query, the information must first be found by the search engines.

This is accomplished by specialized software robots, that are referred to as “search engine spiders”, to compile lists of keywords, and keyword phrases found on web sites.

When you hear the term “web crawling”; it refers to the process the search engine spiders use, to build their lists of keywords, and keyword phrases.

As you can imagine, search engine spiders must look at billions of pages as they build their lists of keywords.

Obviously, there has to be some order to the process.

Usually the search engines spiders, begin their searches in the areas most used by servers and the public.  This means that the most popular web sites, and the most often used servers, are first in line for indexing.

As the spiders crawl a popular website, they also index, and follow, every link found on that website.

This is how the spider “wanders”, spreading through the most used areas on the Internet, and then quickly moving out to other portions of the web.

Each of the search engines, must then store the NEW information that the spiders have accumulated from their Web searches, in a format that makes the data useful, and readily accessible, to people making a search query.

In order for the data to be easily accessible to users, two storage components are important.

The type of information stored with the data, and how the information was indexed.

If a search engine only stored the URL and the keyword, there would be no way of determining the relevance of the search result.

The keyword could have been used in a nonrelevant, or trivial manner, very infrequently, or in another page link to that URL.

In other words, there is no way to rank the relevance of the search result.

In order to get more useful search results, Google and the other search engines, store more than just the keyword and the URL.

Each search engine has their own particular formula for assigning “weight” to the keywords in their indexes.

This is why you can get a different search result, from the same search query, using different search engines.

This is also why different Search Engines present web page results lists in a different order.

Because each search engine formula is different, each search result will differ.

You must remember that an index has but one purpose; to find information as quickly as possible.

Of the many ways indexes are built; building a hash table is one of the most effective methods.

The hash table formula applies a numerical value for each keyword, and is designed to distribute entries evenly across a predetermined number.

It differs from the alphabetical distribution of words, which is why a hash table is so effective.

With a hash table, relevant web pages are listed first, when someone searches for a keyword or keyword phrase.

The search engine software looks for relevant information in it’s index, based the number of times the keyword appears on a page.

It also assigns increased weight values to keywords when they appear in the title, as they appear near the top of each document, in links, in sub headings, and in META tags.

The hash table method is what most Search Engine Optimization strategies are based on.

How search engines work as effectively, and efficiently as they do, is mind boggling, when you consider the millions to billions of searches made, and websites published online, daily.

Here are some easy to use SEO tools that will help you with your search engine rankings.

Reader Comments

I Love Your Blog. Virtually every post makes me crack up, ponder,
and learn something.

Written By competition keywords on May 23rd, 2013 @ 2:06 am

Add a Comment

required, use real name
required, will not be published
optional, your blog address