AI coding agents can ship data pipelines fast — but unstructured generation creates brittle, unmaintainable code. Here's how to stay in control.
Coding agents can build a working data pipeline in the time it takes to finish a coffee. The catch: “working” and “maintainable” are not the same word, and in data infrastructure, that gap compounds every sprint.
If you’re using AI tools to accelerate analytics engineering, data activation logic, or audience segmentation pipelines — and you’re not thinking about how that code is structured — you’re trading short-term velocity for long-term fragility.
The Black Box Problem Is a Data Problem First
Towards Data Science contributor Yonatan Sason recently illustrated this with a deceptively simple example: two architectures for a notification system, both generated by AI, both functional on day one. The unstructured version collapsed everything into a single module — logic, dependencies, state — all coupled together. The structured version decomposed the same system into independent components with explicit, one-directional dependencies.
For a notification system, that’s inconvenient. For a real-time audience segmentation pipeline feeding a Shopee or Lazada campaign activation layer, it’s a production incident waiting to happen.
Data pipelines grow. Schemas shift. Business rules get overridden at 11pm before a campaign goes live. When your AI-generated pipeline is a monolith, every change becomes an archaeology project — and the engineer who inherits it (often you, six months later) has no map.
The deeper issue: AI models optimise for output correctness at generation time. They don’t optimise for change tolerance over time. That’s a human architectural responsibility that we’re quietly outsourcing without realising it.
Structured Generation Isn’t Optional for Production Data Code
The fix isn’t to stop using coding agents — that ship has sailed and the productivity gains are real. The fix is to treat structured generation as a non-negotiable constraint, not a nice-to-have.
What does that mean practically? When prompting Claude Code or any comparable agent to build a data transformation, a segment builder, or an activation logic layer, the prompt architecture matters as much as the code architecture. Eivind Kjosbakken’s walkthrough on production-ready Claude Code outputs makes this concrete: providing explicit interface contracts, module boundaries, and input/output schemas in your prompts pushes the model toward decomposed outputs rather than monolithic ones.
For SEA data teams operating across markets — where a single pipeline might need to handle Thai baht transactions, Indonesian tax logic, and Vietnamese language normalisation in parallel — modular architecture isn’t an engineering preference, it’s a business requirement. A tightly coupled pipeline that mixes currency conversion with segment scoring with notification dispatch is not a pipeline. It’s a liability.
Practical constraint: build a prompt template library for your team’s most common data patterns. Segment builder. Event enrichment layer. Attribution join logic. Standardise the structural expectations in the prompt, not just the functional requirements.
Retrieval Logic Has the Same Problem, at a Different Layer
The maintainability issue doesn’t stop at pipeline code. It surfaces upstream, in the retrieval and search logic that increasingly underpins data products — particularly as teams bolt RAG (retrieval-augmented generation) layers onto internal analytics tools and customer data platforms.
Maria Mouschoutzi’s breakdown of hybrid search — specifically how keyword search methods like TF-IDF and BM25 work alongside vector search in RAG architectures — is directly relevant here. BM25, the probabilistic ranking function that weighs term frequency against document length, behaves very differently from dense vector retrieval. When you combine them in a hybrid search layer without understanding the mechanics, you get retrieval results that look reasonable in testing and degrade unpredictably in production.
For data teams building internal knowledge retrieval on top of campaign performance data, customer interaction logs, or product catalogues — common use cases across SEA’s e-commerce and fintech sectors — understanding why your retrieval layer returns what it returns is the difference between a trustworthy data product and a plausible-looking one.
The black box problem here is subtler: it’s not that the code breaks, it’s that you don’t know why the wrong documents surface when a campaign manager queries “high-value lapsed customers in Q4.” BM25 weights exact keyword matches; vector search weights semantic proximity. Without explicit architectural decisions about when to trust which, your retrieval layer is a coin flip dressed as intelligence.
What Good Governance of AI-Generated Data Code Looks Like
This isn’t an argument for slowing down AI adoption in data teams. It’s an argument for instrumenting it properly.
Three things that actually work in practice:
Mandatory decomposition reviews. Before any AI-generated pipeline module hits staging, a human reviewer checks one thing only: can this be modified at the component level without touching unrelated logic? If not, it goes back for refactoring — AI-assisted refactoring is fine, but the structure must be validated by a human.
Prompt versioning alongside code versioning. The prompt that generated your activation logic is as important as the code itself. If you can’t reproduce or modify the generation, you can’t maintain the output. Treat prompts as first-class engineering artefacts.
Explicit dependency documentation, generated at creation time. Ask your coding agent to produce a dependency map as part of the output, not as an afterthought. Tools like Claude Code can generate this alongside the code if the prompt requires it. For complex pipelines serving real-time audience activation, that map is your circuit breaker.
Data teams in SEA are under real pressure — fewer engineers, more markets, faster campaign cycles. AI coding agents are a genuine multiplier. But a multiplier applied to poor architecture produces poor architecture at scale, faster.
The question worth sitting with: as your data stack increasingly gets written by AI, who in your organisation is responsible for the decisions that AI isn’t making — and do they have the time and mandate to actually make them?
Sources
Written by
Mellow GrizzlyTranslating raw data into activated audience segments, predictive models, and decisioning logic. Comfortable at the intersection of the data warehouse and the campaign manager.