Definition:
Googlebot is Google’s web crawling robot, allowing you to search, add, scan, and index new web pages. Crawling is the process by which Googlebot discovers new and updated pages and adds them to Google’s index.
In addition to locating and indexing web pages, Googlebot also indexes files, such as .doc formats, .zip, xls, etc.
How Googlebot works
Googlebot works through an algorithm that tells you which web pages to crawl, downloading those pages to a database so that they can later be interpreted by the search engine. As it progresses in its tracking, it follows the routes marked by the links, achieving greater and greater depth. All this is done from Google’spowerful servers, spread throughout the planet.
The frequency with which Google visits the different pages that make up the Internet network depends on the importance that Google assigns to those pages. The more authority the page has, the more frequency and depth of crawling it will have from Googlebot.
Versions of Googlebot
There are two versions of Google’s bot:
- Deepbot– crawls the web in depth to include it in Google’s cache.
- Freshbot– search for new content. It does this more frequently on sites that are updated regularly (such as media), and less frequently (days or weeks) on sites with few updates.
How to know if Googlebot visits a website
The tracking by Googlebot is essential to be able to achieve a positioning in Google. That is why sometimes, when the indexing of a website is not achieved, the first step should be to check if the Googlebot has managed to access that website.
When Googlebot visits a website, it leaves in the log a record of the type crawl1.googlebot.com,with its corresponding ip. If there are no Googlebot logs, it is advisable to perform a review of the robots.txt file.
Google Crawlers
We can identify which google crawler has visited a page by viewing the user-agent in the log:
- APIs-Google: APIs-Google
- Adsense: Mediapartners-Google
- Adsbot for Mobile Web: Adsbot-Google-Mobile
- Adsbot: Adsbot-Google
- Googlebot: Googlebot
- Googlebot for images: Googlebot-Image
- Googlebot for news: Googlebot-News
- GoogleBot for video: Googlebot-Video
- Feedtecher: Feedfecher-Google
- Google Read Aloud: Google-Read-Aloud
How to block Googlebot
There are times when for privacy reasons you do not want certain content to appear on Google. In this case, various actions can be carried out:
- Password protect directories: Googlebot will never crawl password-protected sites.
- Use a “noindex” tag on html pages that you do not want to index.