llms.txt Has a 97% Ignore Rate — What That Means for GEO

The search community spent a good portion of 2025 debating whether llms.txt was the robots.txt of the AI era — a tidy signal file that would help large language models understand your site, respect your content, and maybe even prefer you in AI-generated answers. That narrative just took a serious hit.

Ahrefs analysed server logs and bot traffic across 137,000 domains. The finding: 97% of llms.txt files were never read by any bot. Not occasionally missed — structurally ignored. Google’s own John Mueller reinforced this from the other direction, arguing that LLM systems fundamentally cannot use llms.txt to differentiate between sites during discovery. At best, Mueller sees a narrow, on-site utility once an agent has already landed on your domain. As a discoverability signal, it doesn’t exist.

For brands investing in Generative Engine Optimisation (GEO) and Answer Engine Optimisation (AEO), this isn’t a minor footnote. It’s a reframe.

The GEO Signal Problem llms.txt Was Supposed to Solve

The appeal of llms.txt was intuitive: give AI crawlers a curated map of your most important content, written in plain language, and they’d reward you with better representation in AI-generated answers. It mirrored the logic that made sitemaps useful — structured guidance for automated systems.

The problem is that LLMs don’t discover content the way Googlebot does. They’re trained on corpora, not crawling live indexes in real time. When an AI answer engine surfaces your brand, it’s typically drawing on embeddings from training data or retrieval-augmented generation (RAG) pipelines — neither of which is gated by a signal file sitting at your domain root. Ahrefs’ data makes this concrete: the bots simply aren’t reading the file. Mueller’s framing makes it structural: even if they were, differentiation at the discovery level isn’t how these systems work.

This matters for Southeast Asian brands specifically. Markets like Thailand, Vietnam, and Indonesia are seeing rapid AI search adoption through platforms like Perplexity and Google’s AI Overviews — but the underlying infrastructure for how those systems surface regional content is even less transparent than in Western markets. Betting on llms.txt here was always a long shot.

What LLMs Actually Respond To

llms.txt optimisation was, in many ways, a comfort blanket — a file you could create, point to in a deck, and call GEO work done. The Ahrefs study forces a more honest conversation about what actually influences AI-generated answers.

The evidence increasingly points to three durable factors. First, topical authority at scale: AI systems surface sources that appear consistently across many documents on a subject. Tokopedia’s seller education content, for instance, ranks in AI answers about Indonesian e-commerce not because of a signal file, but because it’s cited, linked, and replicated across the web. Second, structured semantic clarity: content that uses clear entity relationships — defined terms, explicit context, FAQ schema — is more likely to be extracted cleanly by RAG pipelines. Third, off-site corroboration: mentions in high-authority publications, forums like Reddit, and regional news outlets create the citation density that trains and reinforces AI system preferences.

None of these require a file at your root directory. All of them require sustained editorial and content architecture investment.

The Quiet Winner: Crawlable Content Infrastructure

Here’s the strategic reframe. The hours your team might spend debating llms.txt implementation are better spent auditing whether your core content is actually indexable, structured, and semantically coherent — which also happens to be the foundation of solid local SEO and traditional organic search.

For brands operating across multiple Southeast Asian markets, this is especially urgent. A multilingual site serving Thai, Bahasa Indonesia, and Vietnamese audiences has compounding structural challenges: hreflang implementation, localised schema markup, mobile page speed across variable network conditions. These are the fundamentals that serve you in Google Search, Google’s AI Overviews, and any RAG-based system simultaneously. An llms.txt file does none of that work.

Practically: run a structured data audit across your top 20 pages. Ensure your FAQPage, HowTo, and LocalBusiness schema are implemented correctly and reflect your actual content — not boilerplate. Make sure your Google Business Profile is fully populated with accurate category data, attributes, and recent posts if local search is part of your remit. These are the signals that compound. A text file that 97% of bots ignore does not.

Don’t Abandon GEO — Abandon the Shortcuts

None of this means GEO is a distraction. AI-influenced search is reshaping how brands get discovered, and Southeast Asia’s accelerating smartphone penetration means AI search interfaces will reach mass adoption faster here than in many Western markets. That’s a genuine strategic reality.

But GEO done well looks less like a new file format and more like old-fashioned content discipline: clear writing, strong entity coverage, consistent brand signals across the open web, and a technical foundation that lets crawlers — human or AI — actually understand what you do and who you serve. Mueller’s point about llms.txt having a narrow on-site utility isn’t nothing. Once an AI agent is already navigating your site, structured guidance might help. That’s a niche use case, not a strategy.

The brands that will win in AI search over the next three years are the ones building genuine topical authority in their markets — not the ones who found the cleverest shortcut around it.

Key Takeaways

With 97% of llms.txt files unread by bots, redirect that implementation effort toward structured data and semantic content architecture that serves both traditional and AI-driven search.
GEO credibility is built through citation density and topical authority across the open web — not through signal files that AI discovery systems structurally cannot use for differentiation.
For Southeast Asian brands, multilingual schema implementation and Google Business Profile hygiene deliver compounding returns across every search surface simultaneously.

The deeper question llms.txt forces us to ask is this: are we optimising for how AI search actually works, or for how we wish it worked? The systems that surface your brand in AI-generated answers are largely opaque, and the playbook is still being written. Brands that stay grounded in durable fundamentals — authority, clarity, structure — will be better positioned than those chasing the next signal file.

At grzzly, we work with growth teams across Southeast Asia on exactly this: separating durable search strategy from well-marketed noise, and building content and technical foundations that perform across Google Search, AI Overviews, and local pack results. If your team is trying to get a clear picture of where your GEO and AEO investment should actually go, we’d be glad to think through it with you. Let’s talk

llms.txt Has a 97% Ignore Rate — What That Means for GEO

The GEO Signal Problem llms.txt Was Supposed to Solve

What LLMs Actually Respond To

The Quiet Winner: Crawlable Content Infrastructure

Don’t Abandon GEO — Abandon the Shortcuts

Enjoyed this?Let's talk.

Enjoyed this?
Let's talk.