Indonesia Singapore ไทย Pilipinas Việt Nam Malaysia မြန်မာ ລາວ
← Back to Blog

Why Robots.txt Blocking Doesn't Mean What You Think in 2026

Robots.txt blocks crawling, not indexing — audit your noindex tags and canonical signals before touching robots.txt for local pages.

Editorial illustration of a figure bypassing a blocked gate through an open window, representing Google indexing URLs despite robots.txt restrictions
Illustrated by Mikael Venne

Google can index URLs blocked by robots.txt. Here's what that means for your local SEO strategy and how to fix what actually matters.

Your robots.txt file is not a vault. It never was — but 2026 is the year this misconception is finally costing brands in the local search results.

Search Engine Journal’s Roger Montti reported this week that Search Console flagged 51,000 URLs as “Indexed, though blocked by robots.txt” for one site — and Google’s response was essentially: that’s working as intended. If the signal contradicts itself, Google will follow the one it finds most credible. The problem isn’t Google. The problem is that most SEO teams treat robots.txt as a content firewall, when it’s really just a polite request.

Robots.txt Blocks Crawlers, Not Indexation

This is the distinction that trips up even experienced teams. Robots.txt is a crawl directive — it tells Googlebot whether it’s welcome to fetch a page. It says nothing about whether that page should appear in the index. Google can learn a URL exists through external backlinks, sitemaps, or internal link signals, and index it without ever crawling it directly.

For local SEO specifically, this creates a specific failure mode. A common practice is to block staging subdirectories or thin location pages via robots.txt while development is underway — then forget to implement proper noindex tags before launch. The result: near-duplicate location pages for your Jakarta, Surabaya, and Bandung branches appearing in Search Console as indexed, unreviewed, and potentially cannibalising each other in the local pack.

The fix is layered: use robots.txt only to manage crawl budget, and use noindex meta tags or canonical signals to manage what actually enters the index. These are different levers.

The Local Page Audit Most Brands Skip

For brands running multi-location operations across Southeast Asia — a QSR chain with 80 outlets across Metro Manila, say, or a bank with branches across Peninsular Malaysia — location page management is where technical debt compounds fast.

Start with a Search Console export filtered by the “Indexed, though blocked” status. Cross-reference against your active Google Business Profile listings. If a URL is blocked by robots.txt but indexed, and it corresponds to a live GBP, you have a signal conflict Google is quietly adjudicating without you. In most cases, that means either removing the robots.txt block and implementing proper indexation controls, or consolidating the page into a canonical location hub.

The secondary check: look at your GBP-linked URLs versus what’s actually appearing in local pack results. In high-competition local categories — insurance brokers in Bangkok, co-working spaces in Ho Chi Minh City — the URL Google chooses to surface in the pack can shift based on which page has the cleanest crawl and index signals.


AEO Implications When Indexation Signals Conflict

This matters beyond traditional SEO. As answer engine optimisation becomes a genuine practice rather than a buzzword, the pages Google selects as sources for AI Overviews and featured snippets are increasingly those with the clearest authority signals. A page that is indexed-but-blocked creates ambiguity in Google’s quality assessment — it exists, but wasn’t meant to be seen, or was it?

Ahrefs’ Ryan Law noted in a recent 2026 marketing trends piece that content teams are shifting from volume-based production toward systems-based curation — fewer pages, stronger signals. For local brands, this maps directly to the location page problem: 40 thin city pages with conflicting crawl signals will consistently lose to 10 well-structured, fully crawlable, GBP-linked location hubs with genuine local content depth.

The AEO opportunity in local search is real — users asking “best physiotherapy clinic near me that opens on Sunday” are increasingly getting AI-generated responses that pull from structured, trustworthy local pages. A page that Google is uncertain about doesn’t get that placement.

What to Actually Fix — and in What Order

Prioritise by business impact, not by the size of the Search Console warning number. A site with 51,000 affected URLs sounds alarming; if 49,000 are staging artifacts with no GBP connection and zero external links, the actual risk is low. The 2,000 that matter are the ones attached to active locations or product categories driving real search demand.

Three actions worth running this quarter:

  • Crawl audit with Screaming Frog or Sitebulb filtered to robots.txt-blocked URLs, cross-referenced against your indexed pages report in GSC. Export both, join on URL.
  • GBP URL alignment check — confirm every active Business Profile is linking to a page that is fully crawlable, properly indexed, and not duplicated by a nearby thin variant.
  • Noindex implementation review — anywhere you’ve used robots.txt to suppress content, evaluate whether a noindex tag or canonical redirect would be more precise. Robots.txt is a blunt instrument; use it accordingly.

In markets like Indonesia and the Philippines where local search intent is heavily concentrated on mobile and app-surfaced results (Google Maps, Grab, Shopee Mall), the margin for signal ambiguity is thin. Proximity is a ranking factor — but only if Google can properly evaluate the page it’s deciding to surface.


The broader lesson here is about control — or the illusion of it. SEO teams often assume that a directive issued means a directive followed. Google’s crawl and indexation systems are probabilistic, not deterministic. The question worth sitting with: how many of your local pages are being evaluated on signals you didn’t consciously set?

At grzzly, we work with multi-location brands across Southeast Asia to untangle exactly this kind of technical debt — from GBP audits to location page architecture that holds up under real crawl conditions. If your Search Console is showing signal conflicts you haven’t had time to properly diagnose, we’re happy to take a look. Let’s talk

Dusty Grizzly

Written by

Dusty Grizzly

Deep in the weeds of Google Business Profiles, local pack mechanics, and neighbourhood-level search intent. Believes proximity is a strategy, not a coincidence.

Enjoyed this?
Let's talk.

Start a conversation