Are Search Engine Robots Useful?

Posted by | Posted in SEO Tutorials | Posted on 24-01-2011

0

Sometimes referred to as “spiders” or crawlers, automated search engine robots seek out web pages for the user. Just how do they accomplish this and is this of importance? What is the real purpose of these robots?

Robots actually have the same basic SEO functionality that earlier browsers had.  Just like these early browsers, search engine robots do not have the ability to do certain things. Robots cannot get past password protected areas. They do not understand frames, Flash movies, nor Images or JavaScript. Even if you use a robot, you have to click the buttons on your website. They can cease to function while using JavaScript navigation or when indexing a dynamically generated URL. A search engine robot retrieves data and finds information and links on the web. 

Spiders are able to determine the content of your page by looking at the visible text, the HTML code, and links. Based on the words it finds, the spider determines what the site is about using a complex SEO algorithm to determine what is and isn’t important. Spiders also collect links from websites to follow later, which allows them to effectively hop from site to site to site.  Since the entire internet is made up of links between websites, the robots use them to make their way through the internet as they search.

Submitting a new URL to a search engine adds this URL to the queue which the spiders are due to “crawl” or visit. However, even if a URL isn’t submitted directly, the spiders usually find it through links from other websites. If you build link popularity, this will help the spiders find you faster. When the robots arrive, they’ll check your site for a file called “robots.txt,” which will tell them what areas of the website they are not allowed to visit. Off-limits files may include things like binaries or other information that the spiders need not report back.

Once the spider has gathered all the information it needs, and based on how the spider is set up in the search engine, it will index the site information and send it to the search engine database.

Once in the database, the information becomes part of the search engine directory and ranking process. Indexing is based on how the search engine engineers have decided to evaluate information returned by the spiders.  When you enter a query into a search engine, it uses several calculations behind the scenes to determine which results you’re most likely looking for, out of the sites the spiders have returned. The database selects the best matches and displays them. The database is constantly updated by spiders crawling websites over and over again, to make sure that the most up-to-date information is available. 

If you’re interested in seeing which pages the spiders have visited on your website, you can check your server logs or the results from your log statistics. From this information you’ll know which spiders have visited, where they went, when they came, and which pages they crawl most often. Some are easy to identify, such as Google’s “Googlebot,” while others are harder: “Slurp” from Inktomi, for example. In addition to identifying which spiders visit, you can also find if any spiders are draining your bandwidth so that you can block them from your site. The internet has plenty of information on identifying these bad bots. There are also certain things can prevent good spiders from crawling your site, such as the site being down or huge amounts of traffic. This can prevent your site from being re-indexed, though most spiders will eventually come by again to try re-accessing the page.

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

Write a comment

*

Anti-Spam Protection by WP-SpamFree