llms.txt 97% Failure: Why AI Crawlers Ignore Your New Robots.txt

97% of published llms.txt files receive zero requests from AI crawlers. Ahrefs data from June 2026 shows that despite widespread implementation enthusiasm, almost no AI engines are actually reading or using the file format that was supposed to give publishers control over how AI crawls and cites their content. This is not a failure of the format itself. It is a clear signal about what AI engines prioritize today, and what publishers should actually focus on to drive AI citations.

The llms.txt file format was introduced as a way for publishers to guide AI crawlers: specify preferred content, exclude irrelevant pages, and provide context for how engines should interpret and cite their work. The idea was compelling: a single text file at the root of your site, readable by ChatGPT, Perplexity, Gemini, and other AI answer engines, telling them exactly what to index and how to attribute. In theory, it was the new robots.txt for the AI era. In practice, the data shows it is mostly unused.

Ahrefs crawled thousands of domains with llms.txt files and tracked crawler requests. 97% of those files never received a single hit from known AI crawler user agents. OpenAI’s ChatGPTBot, Google’s GoogleOther, Perplexity’s PerplexityBot, Anthropic’s ClaudeBot, and other major AI crawlers simply did not request the file. This is not a small sample or an edge case. It is the dominant pattern across the entire dataset.

Google’s John Mueller commented on llms.txt in June 2026, noting that the format cannot help LLMs differentiate between sites because AI engines rely on broader signals like schema, structured data, and citation history rather than site-specific directives. In other words, the file was designed as a control mechanism, but AI engines are not yet treating it as a priority signal for source selection or citation decisions.

This does not mean llms.txt is useless. It means it is premature. The format exists, some sites have implemented it, and some AI engines may eventually adopt it. But right now, if you are investing in llms.txt as a primary lever for AI visibility, you are allocating resources to a signal that almost no engine is using. The better approach is to implement llms.txt as future-proofing, not as a visibility driver, and focus your real effort on the signals that AI crawlers actually use today.

What AI Crawlers Actually Use

If llms.txt is not the answer, what is? The data points to three categories of signals that AI crawlers prioritize: schema markup, structured data, and citation history.

Schema markup is machine-readable annotation embedded in your HTML. It tells AI engines what your content is, not just what it says. A product page with Product schema, an article with Article schema, a FAQ with FAQPage schema, a how-to guide with HowTo schema. These are explicit declarations that AI crawlers can extract without ambiguity. Schema.org types are standardized, widely documented, and already integrated into the knowledge graphs of ChatGPT, Google AI Overviews, Perplexity, and Gemini.

Structured data goes beyond schema. It includes JSON-LD blocks, microdata, and other formats that present information in a predictable, parseable way. When your pricing table is structured data rather than a CSS grid, when your author byline is a JSON-LD object rather than a sentence, when your Q&A is a structured list rather than a paragraph, AI engines can extract that information accurately and efficiently. The cost of extraction drops, and the probability of citation rises.

Citation history is the third pillar. AI engines prefer sources they have already cited. It is a reinforcement loop: if ChatGPT cited your site for a query about sustainable coffee three times last month, it is more likely to cite you again for a similar query next week. Citation history signals trust, relevance, and authority in a way that a single file cannot. It is cumulative, it is cross-domain, and it is the signal that actually moves the needle for AI visibility today.

These three signals are not hypothetical. They are documented in platform guidelines. Google’s AI Overviews documentation explicitly references schema markup and structured data as key inputs for answer extraction. OpenAI’s source selection criteria favor content with clear, extractable answers, which schema and structured data enable. Perplexity’s published methodology emphasizes attribution quality and source diversity, both of which rely on recognizable, parseable content rather than site-specific directives.

The Implementation Gap

Why has llms.txt adoption outpaced AI crawler usage? The answer lies in the incentives and capabilities of both sides.

Publishers want control. They want to tell AI engines what to index, what to ignore, how to attribute. They are familiar with robots.txt from the web search era, and llms.txt looked like the natural successor. Implementation is straightforward: place a text file at /llms.txt, follow the format spec, and wait. Many sites did exactly that, expecting immediate results.

AI engines, on the other hand, are optimizing for accuracy, efficiency, and scalability at massive scale. Processing a site-specific directive file for every domain they crawl adds overhead without proportional benefit when schema, structured data, and citation history already provide rich signals. Moreover, llms.txt is a new format without a standardized implementation across all engines. OpenAI, Google, Perplexity, Anthropic, and others have to agree on how to interpret it, which fields matter, and how to handle conflicts. That consensus takes time.

The result is a timing mismatch. Publishers are ready for llms.txt now. AI engines are not. The gap is temporary, but it is real.

Practical Recommendations

Given the data and current engine behavior, here is how to approach llms.txt and AI visibility today.

Implement llms.txt as future-proofing, not as a primary visibility lever. The cost is low, the effort is minimal, and some engines may adopt it in the future. Create the file following the spec, place it at your root, and move on. Do not expect citation improvements from this alone.

Prioritize schema markup and structured data. Audit your site for missing schema types. Add Product schema to product pages, Article schema to blog posts, FAQPage schema to FAQ sections, HowTo schema to guides. Ensure your JSON-LD blocks are valid, complete, and consistent. This is the signal that AI engines are using right now.

Build citation history. Publish content that answers specific, high-intent questions clearly and directly. Make your answers extractable. Use answer-first structure. Get cited once, and the second citation becomes easier. The third citation is almost automatic. Focus on being the best answer for queries in your niche, and the citation history will follow.

Monitor crawler activity. Use server logs or analytics tools to track requests from AI crawlers. Look for patterns in what pages they visit and how often. If you see consistent crawler activity on your llms.txt file, that is a signal that some engine has started using it. Adjust your strategy accordingly.

Stay informed about platform updates. AI engines are evolving rapidly. ChatGPT, Google, Perplexity, and others regularly announce changes to crawler behavior, source selection criteria, and data ingestion priorities. Subscribe to official blogs, follow platform engineering accounts, and update your approach when new signals are introduced.

The llms.txt Reality Check

The 97% failure rate for llms.txt requests is a reality check, not a verdict. It tells us that the format is not yet a primary signal for AI crawlers. It does not tell us that the format is useless or that it will never matter. It tells us where we are today: in a transition period where AI visibility is driven by schema, structured data, and citation history, and where llms.txt is a bet on the future rather than a lever for the present.

For brands investing in GEO and AEO, the lesson is clear. Allocate your resources to the signals that work now. Implement llms.txt as insurance, but do not expect it to move the needle on its own. Focus on schema, structure, and consistency. Those are the investments that AI crawlers reward today.

The window of free AI visibility is closing. As AI engines introduce advertising, sponsored answers, and premium placements, the organic opportunities that exist today will become more competitive and expensive. Building the right foundation now—schema, structured data, citation history—positions your brand for citations both today and in the future. llms.txt is part of that foundation, but it is not the cornerstone.

FAQ

Is llms.txt completely useless?

No. 97% of files receive zero requests, which means 3% do receive some requests. The format exists, and some AI crawlers may be experimenting with it. However, it is not a primary signal today, and implementing it alone is unlikely to improve AI visibility significantly.

Should I remove my llms.txt file?

No. The cost of keeping it is negligible, and it may become more valuable as AI crawlers adopt the format. Think of it as future-proofing. Just do not rely on it for immediate visibility gains.

What is the most important signal for AI citations?

Schema markup and structured data are currently the most important signals because they make your content extractable and interpretable by AI engines. Citation history is the second most important because it signals trust and authority. Answer-first content structure amplifies both.

How do I know if AI crawlers are reading my llms.txt file?

Check your server logs for requests to /llms.txt from known AI crawler user agents like ChatGPTBot, GoogleOther, PerplexityBot, or ClaudeBot. If you see consistent requests, that indicates some engine is using the file.

Will AI crawlers eventually adopt llms.txt?

Likely yes, but the timeline is uncertain. Adoption depends on standardization across platforms, proven utility for improving answer quality, and scalability for massive crawl volumes. Monitor platform announcements for updates.

How does llms.txt differ from robots.txt?

Robots.txt is for web search crawlers like Googlebot and Bingbot. It tells them which parts of your site to crawl or ignore. llms.txt is designed for AI answer engines, aiming to guide how they interpret, cite, and attribute your content. The formats, purposes, and current adoption levels are different.

Can llms.txt help with Google AI Overviews?

Google has not indicated that AI Overviews uses llms.txt as a primary signal. Google’s John Mueller has stated that the format cannot help LLMs differentiate between sites. For Google AI Overviews, focus on schema markup, structured data, and content quality.

What schema types should I prioritize for AI visibility?

Start with core types relevant to your content: Article for blog posts, Product for product pages, FAQPage for FAQs, HowTo for guides, Organization for company info. These are widely recognized and directly support answer extraction and citation.

How long does it take for schema changes to impact AI citations?

There is no guaranteed timeline, but many sites see initial citation changes within 4-8 weeks of consistent schema implementation and answer-first content publishing. Citation history compounds over time.

Is llms.txt required for searchless.ai’s AI visibility audit?

No. searchless.ai measures AI visibility based on actual citations across ChatGPT, Perplexity, Gemini, and other engines. Schema, structured data, and content quality are the primary inputs. llms.txt is tracked as a future-proofing signal but is not a requirement.

Get your free AI Visibility Score in 60 seconds at audit.searchless.ai.

What AI Crawlers Actually Use#

The Implementation Gap#

Practical Recommendations#

The llms.txt Reality Check#

FAQ#

Is llms.txt completely useless?#

Should I remove my llms.txt file?#

What is the most important signal for AI citations?#

How do I know if AI crawlers are reading my llms.txt file?#

Will AI crawlers eventually adopt llms.txt?#

How does llms.txt differ from robots.txt?#

Can llms.txt help with Google AI Overviews?#

What schema types should I prioritize for AI visibility?#

How long does it take for schema changes to impact AI citations?#

Is llms.txt required for searchless.ai’s AI visibility audit?#