How does the crawler work?
Imagine the internet as a great spider web on which the god Google would launch an army of robotic spiders (hence the names Web and Spider) in order to discover all the sites therein.
After having analyzed and recorded the content of each page discovered, the crawlers report their information back to Google, who registers the information in a large book called the Index.
When you search on Google, it’s not the internet you’re searching, it’s the Google index. If you search for the keyword “apple pie recipe“, Google will tell you that it has discovered 164,000,000 pages in its index that mention “apple pie recipe”, and all of that takes only 0.84 seconds … Impressive.
What are the limitations of Google’s index?
Google’s index will only detect the pages that we want to show. If the programmer of a site does not want Google to index certain pages of his site, he will only have to indicate it with the term no-index in his code. Also, if by mistake you create a web page that is not linked to your site, what is called in the jargon an orphan page, the crawlers will not have the “thread” of the web that will take them to the page so it will risk being ignored by Google. A site well done is a site where all the pages are connected to one another.
Finally, the most important limitation of Google’s index is the time between each indexing for sites that are there. Let’s say I make a change on this page and instead of apple pie, I decide to use the keyword “banana cake” in my example. Well, until Googlebots visit me again, I’ll continue to appear in the index when I’m searching for apple pie. Even worse if I decided to delete this page and replace it with a different URL, users could still go there, but they would face code 404, page not found.
There are hundreds of ranking factors in Google’s algorithm, each with a different weight or assigned value. The algorithm examines a few hundred factors that influence the rank that will be assigned to a page (its relevance), such as its content, the number of other sites linked to that page, and the quality of the website.
When we do SEO, we try to influence these relevance scores. We know that if we optimize the right signals, Google’s algorithm will decide that the page is most relevant – and that it offers one of the best answers to the question – so Google will display this page earlier in the SERP.
Learn how to influence the position of a website’s pages in the Google index on our SEO Académy page.
For more information, here’s a video of our friends at Google!
581 447 4376