Ahrefs analysed 1.4M prompts to find why ChatGPT skips pages it retrieves. Here's what drives AI citation — and how to fix your GEO strategy.
There’s a distinction most marketing teams haven’t caught yet: ChatGPT retrieves roughly twice as many pages as it actually cites. According to Ahrefs’ analysis of 1.4 million prompts, only about 50% of pages the model crawls to answer a query ever receive attribution. You can be in the room and still not get the credit.
That gap — between retrieval and citation — is where generative engine optimisation (GEO) actually lives. And if your current content strategy doesn’t address it, you’re optimising for a search dynamic that stopped being the whole story about eighteen months ago.
Why ChatGPT Passes Over Pages It Already Reads
Ahrefs’ research points to a few consistent patterns in what gets cited versus what gets quietly absorbed. Pages that earn citations tend to have tightly scoped answers — one clear argument per page, not an omnibus guide trying to cover everything. They also carry stronger entity signals: named authors with verifiable expertise, publication dates, and explicit sourcing within the content itself.
The structural lesson here is that ChatGPT isn’t just reading for information — it’s reading for attributability. If a page can’t be cleanly summarised and sourced in a single sentence, the model often defaults to paraphrasing without credit. For Southeast Asian brands publishing multilingual content, this creates a compounding problem: pages that mix languages or lack consistent entity labelling across language versions are harder for LLMs to anchor to a single authoritative source.
Tactically, this means every high-value page needs a clear byline with schema markup, a defined topic scope, and citations within the content that demonstrate the page is itself a credible node in a broader knowledge graph — not an island.
The GEO Playbook Is More Specific Than Most Teams Think
Semrush’s practical GEO framework breaks down into three operational priorities: entity authority, semantic structure, and conversational alignment. Entity authority means your brand, your authors, and your core topics are consistently represented across structured data, third-party mentions, and knowledge bases — so that LLMs can triangulate who you are without ambiguity. This is where many Southeast Asian brands quietly leak: strong on-site content, weak off-site entity coherence.
Semantic structure means your content is written in a way that maps cleanly to how LLMs decompose queries — typically into a primary intent, a set of sub-questions, and a preferred answer format. Long-form content that buries its point 600 words in doesn’t perform well in generative results, regardless of how well it ranks in traditional SERPs.
Conversational alignment is the trickiest: it requires understanding not just what queries exist, but how users are phrasing questions to AI interfaces versus search bars. Moz’s AI research workflow, built around prompt discovery before content creation, reflects this correctly — you can’t optimise for conversational queries you haven’t mapped first.
The Trust Paradox Hiding Inside AI Content Strategy
Here’s an uncomfortable wrinkle for teams leaning heavily on AI-generated content to fuel their GEO efforts. Gallup’s recent survey data, surfaced by Search Engine Journal, shows Gen Z workers trust human-only output over AI-assisted work by more than 2-to-1 — and that gap is widening, not closing. The cohort that grew up with AI tools is also the cohort most sceptical of them in professional contexts.
This matters for GEO because the signals LLMs use to evaluate source authority don’t exist in a vacuum. Forum discussions, Reddit threads, LinkedIn commentary, and brand mention sentiment all feed into how models weight credibility over time. If your AI-generated content is being flagged, dismissed, or ignored by the very audiences who discuss topics in those spaces, that signal eventually propagates into how generative engines perceive your authority.
The practical implication: AI tools belong in the research and structure phase of content production, not as the primary voice. The bylined, perspective-driven content that humans write — the kind that generates genuine engagement and third-party citation — is also the kind that earns LLM attribution. This isn’t a coincidence.
Tracking AI Visibility Before It Becomes a Reporting Problem
Most marketing teams are still measuring GEO performance through proxies — organic traffic, branded search volume, share of voice in traditional SERPs. These metrics lag the actual visibility shift by months. By the time declining traffic signals that your generative presence has eroded, the gap has already been claimed by a competitor.
Moz’s AI Research toolkit points toward a more proactive model: tracking which prompts surface your brand, which surface competitors, and where the delta is growing. The workflow — prompt discovery, gap analysis, content mapping, structured optimisation, performance monitoring — mirrors traditional SEO cycles but operates on a faster feedback loop because LLM indexing behaviour shifts more fluidly than crawler-based rankings.
For teams managing multi-market content across Southeast Asia, this also means building prompt libraries that reflect local query behaviour: Bahasa Indonesia users ask AI questions differently than Thai or Filipino users, and the conversational structures that earn citations in one market won’t automatically transfer.
Key Takeaways
- Map the retrieval-to-citation gap first: audit which of your high-value pages ChatGPT retrieves but doesn’t cite, then diagnose whether the issue is entity clarity, structural ambiguity, or weak off-site authority signals.
- Build entity coherence across the full ecosystem — schema markup, author bios, third-party mentions, and knowledge base entries — before optimising page-level content, especially if you’re operating across multiple Southeast Asian markets with inconsistent brand representation.
- Treat AI-assisted content as a research and scaffolding tool, not a publishing output: the human perspective and genuine engagement it generates are themselves GEO signals that compound over time.
The brands that will own generative search results in 2027 are making infrastructure decisions right now — entity architecture, content authority, prompt visibility tracking — that look invisible from the outside. The question worth sitting with: does your team even have a framework for knowing whether ChatGPT considers you a credible source, or are you still waiting for traffic to tell you?
At grzzly, we work with growth teams across Southeast Asia on exactly this — mapping entity authority gaps, building GEO-ready content frameworks, and tracking AI visibility before it becomes a competitive blind spot. If your brand is producing content but not earning citations, that’s a solvable problem. Let’s talk
Sources
Written by
Sneaky GrizzlyTracking the quiet revolution inside LLM-powered search — where brand mentions, structured semantics, and entity authority rewrite the rules of discoverability before most marketers notice.