What you can expect from this article

The meta robots tag instructs search engines which pages you want them to index and how. This article provides an in-depth look at some of the intricacies of this tag and, more importantly, shows you how to put it to work for you today.

Wat is een meta robots tag?

De meta robots tag wordt gebruikt aan zoekmachines instructies te geven over het wel of niet indexeren van de pagina waar de tag op geplaatst. De meta robots tag wordt geplaatst in de HTML broncode van de pagina, en kan er als volgt uitzien:

<meta name="robots" content="noindex, follow" />

Kort gezegd kan je middels de meta robots tags fine-tunen of een zoekmachine een specifieke pagina wel of niet moet opnemen in de zoekresultaat pagina’s.

Why you should care about the meta robots tag

Whether you’re a website owner or an SEO specialist, you need to be able to clearly signal to search engines how you want your websites indexed. The meta robots tag makes that possible.

Even though search engines have come a long way in understanding websites, when it comes to indexation you don’t want to leave it up to their algorithms to determine what pages ought to be indexed and which ones not. That alone is reason enough to make the meta robots tag an essential part of your SEO toolbox.

The meta robots tag is often used to combat duplicate content (pages that are accessible through multiple URLs). This gives conflicting signals to search engines, essentially confusing them (never a good thing!).

De meta robots tag directives

Met de meta robots tag kunnen veel verschillende instructies gegeven worden aan zoekmachines. Een overzicht:

noindex

Middels de noindex directive geef je aan dat zoekmachines de pagina niet moeten indexeren, waardoor deze niet getoond wordt wanneer zoekmachine gebruikers een zoekactie doen.

nofollow

De nofollow directive geeft aan dat zoekmachine robots de links binnen de pagina niet moeten volgen en geen link autoriteit moeten doorgeven.

none

Met de none directive vertel je zoekmachines dat de pagina genegeerd moet worden. Deze directive wordt soms gebruikt als afkorting voor noindex en nofollow directives.

Protip: wanneer je de none directive of de noindex,nofollow directive gebruikt is het aan te raden om de pagina volledig uit te sluiten middels het robots.txt bestand.

noarchive

Middels de noarchive directive voorkom je dat zoekmachines een gecachete variant van de pagina tonen.

nosnippet

Met de nosnippet directive zorg je ervoor dat dat zoekmachines geen snippets in de resultaatpagina’s tonen en voorkomt tevens het cachen van de pagina door zoekmachines.

noodp

Voorheen werd de noodp directive gebruikt om te voorkomen dat zoekmachines de beschrijving van de pagina uitlezen van DMOZ (een open content catalogus welke door vrijwilligers werd beheerd) om als snippet voor de pagina te gebruiken. Sinds mei 2017 is DMOZ gesloten waardoor deze directive geen waarde meer heeft.

notranslate

Middels de notranslate directive geef je aan dat zoekmachines geen vertaalde versie van de pagina moet weergeven in de resultaatpagina’s.

unavailable_after

De unavailable_after directive vertelt zoekmachines dat de pagina niet getoond moet worden na een specifieke datum. De datum/tijd combinatie moet het RFC 850 formaat volgen.

index

De index directive geeft het signaal aan zoekmachines dat je wil dat de pagina geïndexeerd wordt. Normaliter wordt deze directive automatisch ingesteld door de meeste Content Management Systems.

Situaties waarbij je de meta robots tag moet gebruiken

A great example of using the meta robots tag is when you have a staging environment which is used to roll out new features, test them, approve them and then release them to the production environment. You want to avoid having the staging environment indexed because of duplicate content issues, so you can put a <meta name="robots" content="noindex" /> on page to prevent search engines from showing the staging environment, should they gain access to it. It’s recommended to have multiple measures in place to prevent search engines from indexing your staging environment, but meta robots tag is definitely one of them.

<!DOCTYPE html>
<html><head>
<meta name=”robots” content=”noindex” />
(…)
</head>
<body>(…)</body>
</html>

If a scenario presents itself where there are competing directives, the crawlers will default to the most restrictive directive (similar to the robots.txt file).

Combining meta robots tag directives

It’s pretty common to want to signal multiple commands to bots that visit your page. And combining meta robots tag directives is by far the best way to do that. You can start out by creating a multi-directive instruction, utilizing meta robots tag directives that allow simultaneous actions at once.

Example:

<meta name=”robots” content=”noindex, nofollow”>

Then there are situations that call for signalling different directives to different crawlers. The directives below, for instance, yield a noindex, nofollow directive when crawled by a Googlebot and a Bingbot will choose to ignore the noindex directive altogether.

<meta name=”robots” content=”nofollow”>
<meta name=”googlebot” content=“noindex”>

The X-Robots-Tag HTTP header

When you’re dealing with images and PDF files you don’t want to get indexed by search engines, the X-Robots-Tag is your best bet.

Within a page’s HTTP response you can communicate your indexation preferences to search engines.

For example, if you’re using Apache web server and you’d like to add a noindex, nofollow X-Robots-Tag to the HTTP response for all of your .PDF files, you’d write the following:

<Files ~ “\.pdf$”>
Header set X-Robots-Tag “noindex, nofollow”
</Files>

Alternatively you can do the same for images of file types png, jpg and gif:

<Files ~ “\.(png|jpe?g|gif)$”>
Header set X-Robots-Tag “noindex”
</Files>

Meta robots tag vs robots.txt vs X-Robots header

So there are a few different ways to let search engine know your preferences around indexation, and each serves its own purpose. But when to use which one? To help with that question, here’s a quick rundown of each method (the meta robots tag, the robots.txt file, and the X-Robots header) and where it makes sense to use it.

Meta robots tag: use the meta robots tag to signal your preferences around indexation of your pages. Based on this, search engine bots may ignore a page entirely or even determine which links to follow and which links to not follow on within your website by using this tag.

Robots.txt: the robots.txt file is used to signal your preferences around access to your pages for search engines. It’s important to understand that if you prevent access to your pages, search engines will never be able to index that content properly.

X-Robots-Header: this X-Robots-Header is used to indicate to search engines that you’d like a specific page not to get indexed. For PDF files and images it’s the only way to signal indexation preferences, so that’s what it’s used for mostly.

FAQs

Some frequently asked questions about meta robots tags:

  1. What if there are no spaces between commands in the meta robots tag?
  2. What if there are no commas in the meta robots tag?
  3. Are commands case-sensitive?
  4. How do I see the X-Robots-Header?
  5. Will search engines still crawl pages that have a meta robots tag?

1. What if there are no spaces between commands in the meta robots tag?

This is a common concern for many people using MRTs, but don’t worry—all major search engines automatically omit spacing in the commands. That means that it is not a factor within the tag directive (see example below):

<HEAD>
<meta name=”ROBOTS” content=”NOARCHIVE,NOODP,NOYDIR”>
</HEAD>

is the same as

<meta name=”ROBOTS” content=”NOARCHIVE, NOODP, NOYDIR”>

2. What if there are no commas in the meta robots tag?

It’s best to use commas in the meta robots tag. Bing claims they do not care one way or another, but Google does. And that right there is enough reason to use them (here’s an example of HOW NOT TO DO IT):

<meta name=”ROBOTS” content=”NOARCHIVE NOODP NOYDIR”>

3. Are commands case-sensitive?

Nope. Google, Yahoo, and Bing can recognize what the command is within the directive, even if it’s randomly upper-cased and lower-cased. Case in point:

<meta name=”ROBOTS” content=”NOODP”>
<meta name=”robots” content=”noodp”>
<meta name=”Robots” content=”NoOdp”>

4. How do I see the X-Robots-Header?

The X-Robots Header can be viewed within the HTTP headers.

5. Will search engines still crawl pages that have a meta robots tag?

Yes, unless you place a directive instructing the bots to NOT crawl specific pages within your site.

In Conclusion…

The meta robots tag is a surefire way take more control over the way search engines index and present your website. So if you’re looking to garner an edge to outperform your competitors’ content, don’t leave it up to search engine algorithms. Start using the meta robots tag and find out just how much it can do for your business!

Waarom wachten? Probeer het direct.

Eindelijk doorhebben wat er werkelijk gebeurt op jouw website.
Gelieve een geldige domeinnaam (www.voorbeeld.nl) op te geven.