Ideas for Arc XP

Add a X-Robots-Tag: noindex in the response headers of the content sources

Request: Currently, on multiple websites, a rule is added in the robots.txt file to avoid indexing the content sources paths: Disallow: /pf/api/

However, there is a fundamental difference between authorizing crawling (access) and authorizing indexing (visibility). We would like to implement the following changes based on these points:

1. Why indexing APIs is rare

Google seeks to index content that is useful to human users (HTML). URLs like /pf/api/ generally return JSON code.

  • Googlebot "consumes" this JSON to understand and build the web page (rendering).

  • However, Google has no interest in displaying a raw data file (JSON) in its search results because it provides no value to the end user.

2. "Crawl" vs. "Index"

By modifying the robots.txt to allow /pf/api/, we are telling Google: "You have permission to read this data to build my pages." This does not mean: "You must display these URLs in your search results."

3. The Recommended Solution: X-Robots-Tag

The best practice for APIs is not to block them in robots.txt, but to add an instruction in the HTTP header.
https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag#xrobotstag

Could you please check if you could add at the server or CDN level this header to the API responses: X-Robots-Tag: noindex

Why is this the perfect solution?

  • The bot can read the content (Crawl: OK): It can therefore render the articles correctly and see the text.

  • The bot cannot index it (Index: NO): The API URL itself will never be visible in Google search results.

  • Edem Lawson-Body
  • Dec 18 2025
  • Needs review
  • Attach files