AEO Dictionary

Robots.txt

The Robots.txt file is a plain text file placed in the root directory of a website that is used to specify rules for search engine crawlers. It instructs bots on which sections of the site they are or are not allowed to crawl. Its primary function is to manage a website's Crawl Budget, helping to ensure that important pages are efficiently discovered and indexed while keeping less important or private sections from being accessed.

Why It Matters

In the AI era, robots.txt has taken on new importance as the primary method for allowing or blocking AI training bots (like GPTBot, CCBot, or ClaudeBot). If you block these bots, your content will not be included in the training data for future AI models, potentially rendering your brand invisible in generative answers. However, some publishers block them to protect intellectual property.

How We Use It at Soprano

If you want maximum visibility you can just leave the robot.txt untouched, as we do. But we recommend disallowing admin pages, logged in pages, or duplicate content parameters to preserve crawl budget.

Citations