{"id":20279,"date":"2020-01-30T10:28:10","date_gmt":"2020-01-30T10:28:10","guid":{"rendered":"https:\/\/www.arimetrics.com\/glosario-digital\/crawler"},"modified":"2026-06-26T13:44:25","modified_gmt":"2026-06-26T13:44:25","slug":"crawler","status":"publish","type":"encyclopedia","link":"https:\/\/www.arimetrics.com\/en\/digital-glossary\/crawler","title":{"rendered":"Crawler"},"content":{"rendered":"<p><img decoding=\"async\" class=\"boxpad alignright wp-image-14132 size-full\" src=\"https:\/\/www.arimetrics.com\/wp-content\/uploads\/2020\/01\/crawler.png\" alt=\"Crawler\" width=\"300\" height=\"300\" srcset=\"https:\/\/www.arimetrics.com\/wp-content\/uploads\/2020\/01\/crawler.png 300w, https:\/\/www.arimetrics.com\/wp-content\/uploads\/2020\/01\/crawler-150x150.png 150w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><strong>Meaning: <\/strong><\/p>\n<p>A <em><strong>crawler <\/strong><\/em>or web crawler, also known as a spider, is a bot that helps in indexing the web. They browse one page at a time through a website until all pages have been <a href=\"https:\/\/www.arimetrics.com\/en\/digital-glossary\/indexing\">indexed<\/a>. Web <strong>crawlers or crawlers<\/strong> help in collecting information about a website and the links related to them and also help validate HTML code and hyperlinks.<\/p>\n\n<h2 class=\"indexed\"><span class=\"heading_text\">How crawlers work<\/span><\/h2>\n<p>Web crawlers collect information such as the URL of the web page, the information of the meta tags, the content of the web page, the links on the web page and the main destinations of those links, the title of the web page and any other relevant information. They keep track of URLs that have already been downloaded to prevent the same page from being downloaded again.<\/p>\n<p>A combination of policies such as re-visit, selection policy, parallelization policy and courtesy policy determines the behavior of the web browser. There are many challenges for web crawlers, due to the continuous evolution of the network, the advantages and disadvantages of content selection, social obligations and facing competition.<\/p>\n<h2 class=\"indexed\"><span id=\"crawlers-y-buscadores\" class=\"anchor\"><\/span><span class=\"heading_text\">Crawlers and search engines<\/span><\/h2>\n<p>Web crawlers are the key components of the web search engines and systems you see on web pages. They help in the indexing of web entries and allow users to submit queries in the index and also provide the pages that respond to queries. Another use of web crawlers is to archive websites, which involves large sets of web pages to be collected and archived periodically. Web crawlers are also used in data mining, where pages are analyzed for their different properties such as statistics, and are also used for<a href=\"https:\/\/www.arimetrics.com\/agencia-analitica-web\"> data analysis.<\/a><\/p>\n<h2 class=\"indexed\"><span id=\"uso-de-los-crawlers\" class=\"anchor\"><\/span><span class=\"heading_text\">Using Crawlers<\/span><\/h2>\n<p>Crawlers are mostly used to collect data from other websites with which to create a much larger database than you could otherwise. To extract the data, the different search engines are used that analyze the sites and give them a position in the SERPs, among other things.<\/p>\n<p>These crawlers analyze ecommerce prices, external links, internal links, addresses, emails&#8230; Of all the pages you find and then organize that information.<\/p>\n<h2 class=\"indexed\"><span id=\"tipos-de-crawlers\" class=\"anchor\"><\/span><span class=\"heading_text\">Types of Crawlers<\/span><\/h2>\n<p>RBSE (Eichmann, 1994) this crawler was the first to be published and is based on two programs fundamentally, the first, spider, maintains the relational database and the second program, mite, downloads the web pages.<\/p>\n<p>World Wide Web Worm (McBryan, 1994) this crawler collects the data and builds an index of titles and urls of the pages.<\/p>\n<p>Google Crawl (Brin and Page, 1998) this crawler based on C++ and Python, travels the Internet extracting the information from the domains and analyzing if that data is new or was already there when it happened previously. If it is not, add the document to the database.<\/p>\n<p>There are many more crawlers, used for many things, some of them unethical and legal, I invite you to look for more information about the operation of these content indexers.<\/p>\n<h2 class=\"indexed\"><span id=\"como-bloquear-a-los-crawlers\" class=\"anchor\"><\/span><span class=\"heading_text\">How to block Crawlers<\/span><\/h2>\n<p>If you do not want any of the existing crawlers to enter your website and take information, you can block them through the <a href=\"https:\/\/www.arimetrics.com\/glosario-digital\/robots-txt\">robots file.txt<\/a>. To do this you have to use the User-agent: directive and the name of the bot you do not want to access and Disallow: \/. In the case of Google, the user agent would be Googlebot and in the case of the Semrush tool, User-agent: SemrushBot Disallow: \/<\/p>\n<pre>User-agent: SemrushBot-SA\r\nDisallow: \/<\/pre>\n<h2>Frequently asked questions about Crawler<\/h2>\n<div class=\"geo-faq-block\">\n<details class=\"geo-faq-item\">\n<summary>What is a Crawler?<\/summary>\n<p>A Crawler is a program that goes through web pages by following links to discover, read and collect information. Search engines use it to crawl sites, update indexes and understand the structure of the web.<\/p>\n<\/details>\n<details class=\"geo-faq-item\">\n<summary>What is a Crawler used for in SEO?<\/summary>\n<p>It is used to analyze how a site can be crawled, detect technical errors, review links, titles, metadata, canonicals, status codes, depth and indexability problems. SEO tools use crawlers to audit websites.<\/p>\n<\/details>\n<details class=\"geo-faq-item\">\n<summary>What is the difference between a Crawler and an indexer?<\/summary>\n<p>The Crawler discovers and downloads pages or resources. The indexer processes that information to decide what content is stored, how it is interpreted and whether it can appear in search results.<\/p>\n<\/details>\n<details class=\"geo-faq-item\">\n<summary>What should a website make easier for Crawlers?<\/summary>\n<p>It should make clear architecture, accessible internal links, fast responses, correct status codes, useful sitemaps, coherent canonicals and well-configured robots rules easier. It should also avoid accidental blocking of important content.<\/p>\n<\/details>\n<\/div>\n<p><script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@graph\": [\n    {\n      \"@type\": \"DefinedTerm\",\n      \"@id\": \"https:\/\/www.arimetrics.com\/en\/digital-glossary\/crawler#definedterm\",\n      \"name\": \"Crawler\",\n      \"description\": \"Definition of Crawler in the Arimetrics Digital Glossary.\",\n      \"inDefinedTermSet\": {\n        \"@type\": \"DefinedTermSet\",\n        \"name\": \"Arimetrics Digital Glossary\",\n        \"url\": \"https:\/\/www.arimetrics.com\/en\/digital-glossary\"\n      }\n    },\n    {\n      \"@type\": \"FAQPage\",\n      \"@id\": \"https:\/\/www.arimetrics.com\/en\/digital-glossary\/crawler#faq\",\n      \"mainEntity\": [\n        {\n          \"@type\": \"Question\",\n          \"name\": \"What is a Crawler?\",\n          \"acceptedAnswer\": {\n            \"@type\": \"Answer\",\n            \"text\": \"A Crawler is a program that goes through web pages by following links to discover, read and collect information. Search engines use it to crawl sites, update indexes and understand the structure of the web.\"\n          }\n        },\n        {\n          \"@type\": \"Question\",\n          \"name\": \"What is a Crawler used for in SEO?\",\n          \"acceptedAnswer\": {\n            \"@type\": \"Answer\",\n            \"text\": \"It is used to analyze how a site can be crawled, detect technical errors, review links, titles, metadata, canonicals, status codes, depth and indexability problems. SEO tools use crawlers to audit websites.\"\n          }\n        },\n        {\n          \"@type\": \"Question\",\n          \"name\": \"What is the difference between a Crawler and an indexer?\",\n          \"acceptedAnswer\": {\n            \"@type\": \"Answer\",\n            \"text\": \"The Crawler discovers and downloads pages or resources. The indexer processes that information to decide what content is stored, how it is interpreted and whether it can appear in search results.\"\n          }\n        },\n        {\n          \"@type\": \"Question\",\n          \"name\": \"What should a website make easier for Crawlers?\",\n          \"acceptedAnswer\": {\n            \"@type\": \"Answer\",\n            \"text\": \"It should make clear architecture, accessible internal links, fast responses, correct status codes, useful sitemaps, coherent canonicals and well-configured robots rules easier. It should also avoid accidental blocking of important content.\"\n          }\n        }\n      ]\n    }\n  ]\n}\n<\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Meaning: A crawler or web crawler, also known as a spider, is a bot that helps in indexing the web. They browse one page at a time through a website until all pages have been indexed. Web crawlers or crawlers help in collecting information about a website and the links related to them and also [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"template":"","encyclopedia-tag":[1004],"class_list":["post-20279","encyclopedia","type-encyclopedia","status-publish","hentry","encyclopedia-tag-indexacion-seo"],"_links":{"self":[{"href":"https:\/\/www.arimetrics.com\/en\/wp-json\/wp\/v2\/encyclopedia\/20279","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.arimetrics.com\/en\/wp-json\/wp\/v2\/encyclopedia"}],"about":[{"href":"https:\/\/www.arimetrics.com\/en\/wp-json\/wp\/v2\/types\/encyclopedia"}],"author":[{"embeddable":true,"href":"https:\/\/www.arimetrics.com\/en\/wp-json\/wp\/v2\/users\/6"}],"wp:attachment":[{"href":"https:\/\/www.arimetrics.com\/en\/wp-json\/wp\/v2\/media?parent=20279"}],"wp:term":[{"taxonomy":"encyclopedia-tag","embeddable":true,"href":"https:\/\/www.arimetrics.com\/en\/wp-json\/wp\/v2\/encyclopedia-tag?post=20279"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}