Request: Currently, on multiple websites, a rule is added in the robots.txt file to avoid indexing the content sources paths: Disallow: /pf/api/
However, there is a fundamental difference between authorizing crawling (access) and authorizing indexing (visibility). We would like to implement the following changes based on these points:
Google seeks to index content that is useful to human users (HTML). URLs like /pf/api/ generally return JSON code.
Googlebot "consumes" this JSON to understand and build the web page (rendering).
However, Google has no interest in displaying a raw data file (JSON) in its search results because it provides no value to the end user.
By modifying the robots.txt to allow /pf/api/, we are telling Google: "You have permission to read this data to build my pages." This does not mean: "You must display these URLs in your search results."
The best practice for APIs is not to block them in robots.txt, but to add an instruction in the HTTP header.
https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag#xrobotstag
Could you please check if you could add at the server or CDN level this header to the API responses: X-Robots-Tag: noindex
Why is this the perfect solution?
The bot can read the content (Crawl: OK): It can therefore render the articles correctly and see the text.
The bot cannot index it (Index: NO): The API URL itself will never be visible in Google search results.