Crawling & Indexing Optimization 

Meta Robots Data

Ein Mann in weißem Hemd arbeitet lächelnd am Computer in einem hellen Büro.
By Ralph Grundmann · Last updated on 02.03.2026

In this post, you will learn about the significance of the Meta Robots data in the area of search engine optimization (SEO) and how they can help effectively manage the index management of your website. You will learn how to influence the visibility of your pages in search results using directives such as index/noindex and follow/nofollow, ensuring that only relevant content is indexed by search engines. Additionally, you will find out which common mistakes related to Robots.txt and meta control should be avoided.

Meta Robots Data: Their Importance for SEO?

The so-called Meta Robots data in SEO are part of the header data of the HTML structure and help search engines (especially Google). Here, it can be controlled whether the respective page (URL) should be indexed and whether the found links should be followed or not.

These tools serve in search engine optimization for index management, which controls the discoverability of a page. It is essential for a URL to be captured (indexed) by the search engine in order to be found by searchers afterwards. Pages that are not in the index cannot be found.

Index Management Methodologies

With index management, you signal to the search engine whether a URL should be included in the index or not. In addition to the meta robots data, there are other methodologies to control indexing:

The idea behind this is that not all pages of a domain need to be found. For example, the shopping cart page of an online shop does not provide any value for a search query. This is a purely operational page in the checkout process that does not have any standalone content. Secondly, search engines capture pages through links. As a website operator, you do not always want a linked page to be indexed by the search engine.

Meta Robots: index/noindex

The HTML instruction for this command (HTML tag) is:

<meta name="robots" content="index" />

or

<meta name="robots" content="noindex" />

This command indicates in the first case that the respective page should be included in the index and be findable. The second variant states exactly the opposite. As mentioned earlier, the cart page could be set to “noindex” – what other pages make sense?

  • All pages behind a login such as “My Account”, “My Wishlist” etc.
  • The search results page of the internal search – this page is (when no search has been initiated) empty and thus has no value for the user. For this reason, in the worst case, such a page would only increase the bounce rates of a domain.
    Since the search engine cannot trigger searches, a filled page is also not to be expected.
    Sometimes shop managers use filled search results pages for internal linking (e.g., when the corresponding category is missing). And that is why it is additionally important to set this page to noindex, as even filled search results do not provide any value and can significantly worsen crawling.
  • Pages with session IDs
  • Filter pages (although here one would not resort to “noindex”, but to parameter management)

A recurring error is that (possibly during a relaunch) de-indexing was forgotten. When the search engine starts to index pages, those described above also get into the index. Often, the relevant pages are excluded via Robots.txt. However, since they are already in the index, this is pointless. At the same time, the meta data on the pages is rewritten – consequence:

  • Search results (snippets) are enriched by the search engine with the note that the page is excluded.
  • Nevertheless, the pages do not disappear from the index.

The problem here is that the robots.txt prevents crawling, and therefore the search engine does not pick up the new meta data because it does not crawl the page. Sounds confusing? It is, but we will describe the correct handling of the robots.txt (which controls crawling, not the index) there.

Meta Robots: follow/nofollow

The SEO legend says that SEOs abused Wikipedia as a backlink source in the early 2000s. This is said to have prompted Google, in collaboration with Wikipedia, to set all links from Wikipedia to other pages to “nofollow” so that Google would not follow them and, above all, would not include them in the relevance assessment.

To understand this (more details in the chapter: Backlinks), it is important to know that Google includes links from external sources to a domain in its relevance assessment. This led to link purchases in the early years of SEO. Google then put a stop to this by rolling out the Penguin update. So if you do not want to pass on link juice to other pages, you can also set your own links to “nofollow.”

This is done either as an attribute of the HTML hyperlink command or in the meta data:

<meta name="robots" content="follow" />

or

<meta name="robots" content="nofollow" />

In itself, this is only the “last resort” – it is actually not good style, as you consider the link you are setting to be important. Users should follow it, as it ideally represents added value. In this case, it is only fair and just to pass on the link juice as well.

For internal links, the “nofollow” attribute makes even less sense, as internal links are never “bad” links. However, if you believe that the target page should not be indexed, then you should set it to “noindex” (see above). And for the search engine to “understand” this, it must follow the link to perceive the de-indexing information of the following page.

Do you want to improve your click-through rate and need support? Then contact us! Do you want to learn more or have your team trained on this topic? Then check out our Analytics seminars.

Weitere Artikel: