Indonesia Singapore ไทย Pilipinas Việt Nam Malaysia မြန်မာ ລາວ
← Back to Blog

Stop Crediting the Last Touch: Causation in CEP Strategy

Use propensity score matching to isolate true engagement lift before scaling any CEP intervention — or you'll optimise a mirage.

Editorial illustration of a figure chasing a shadow that leads in a completely different direction than the figure itself
Illustrated by Mikael Venne

Correlation isn't causation — and in customer engagement, confusing the two wastes budget. Here's how causal inference reshapes CEP strategy in SEA.

Your loyalty campaign drove a 34% lift in repeat purchases. Impressive — until you realise those users were already your most loyal customers before you touched them. What you measured wasn’t impact. It was a mirror.

This is the core tension in customer engagement platform (CEP) strategy right now: the infrastructure has never been more capable of personalisation at scale, but the measurement logic underpinning most activation decisions is still built on correlation. In Southeast Asia’s high-frequency, multi-platform environments — where a single customer might engage via Shopee, LINE, and a brand app within 48 hours — the gap between what looks like causation and what actually is has never been more expensive.

The Selection Bias Problem Hiding in Your Engagement Data

When you send a re-engagement campaign to users who’ve shown recent browsing activity, you’re not running a neutral experiment — you’re targeting people already predisposed to convert. The bump in conversion you see afterward isn’t necessarily caused by your campaign. It’s caused by the fact that you selected high-intent users to begin with.

This is selection bias, and it quietly inflates the perceived ROI of nearly every triggered communication in a typical CEP playbook. Towards Data Science recently outlined how propensity score matching (PSM) addresses this directly: by constructing a statistical control group of users who share the same observable characteristics as your treated group but didn’t receive the intervention, you isolate the actual causal effect of your message. In practical terms, that means matching on variables like recency, session frequency, purchase history, and channel affinity — then comparing outcomes between the two groups.

For a mid-size e-commerce brand running 15–20 active journeys simultaneously across Lazada and a branded app, the difference between correlational and causal measurement can mean reallocating 20–30% of activation budget toward genuinely effective touchpoints.

Building the Data Foundation That Makes Causal Measurement Possible

PSM isn’t a plug-in — it requires clean, reliable upstream data. And this is where most CEP implementations quietly unravel. The matching process depends on consistent, trustworthy feature sets across your customer profiles: behavioural signals that haven’t drifted, event timestamps that are actually accurate, and pipeline outputs that someone has actively validated.

The Monte Carlo MC Agent Toolkit surfaces something important here: in modern data stacks, AI agents are increasingly writing transformation code and modifying pipelines without any awareness of whether the underlying data is reliable. If your propensity model is trained on features derived from a broken pipeline — a silent schema change in your Shopee connector, say, or a misconfigured sessionisation window — your statistical twins aren’t twins at all. They’re strangers wearing the same hat.

The implication for teams building causal measurement into their CEP strategy: data observability isn’t optional infrastructure. It’s a prerequisite for trusting your intervention logic. Implement data quality checks at the feature store level, not just at ingestion. Flag anomalies in the signals that feed your matching models before they compound into flawed conclusions.


Architecture Choices That Determine Whether This Scales

Causal inference at CEP scale isn’t a one-time analysis — it needs to be embedded into how your data layer operates. This is where architecture decisions become strategic. The dbt-Databricks combination has become a common pairing for teams trying to manage transformation complexity at volume, and as dbt’s Keith Ludeman notes, the cost of skipping disciplined transformation logic compounds quietly. Ad hoc SQL chains that produce your engagement features today become unmaintainable six months later when you’re trying to retroactively reconstruct control group eligibility.

For Southeast Asian brands operating across multilingual markets — Thai, Bahasa, Vietnamese, Filipino — there’s an additional wrinkle: feature engineering for propensity models needs to account for behavioural differences that are partly cultural and partly platform-driven. A user’s “high intent” signal on Tokopedia looks different from one on Lazada Malaysia. Collapsing these into a single propensity model without market-level stratification produces matches that are statistically plausible but operationally meaningless.

The practical recommendation: build market-specific propensity layers within your transformation logic, and version them explicitly. When your CEP team asks why a campaign underperformed in Vietnam despite strong regional numbers, you want to be able to answer that question from your data layer — not reconstruct it retroactively.

From Measurement Fix to Engagement Philosophy

There’s a deeper shift here that goes beyond methodology. Most CEP platforms — Braze, CleverTap, MoEngage — are optimised for activation velocity: build a journey, set a trigger, push a message. The measurement layer is usually an afterthought, bolted on after the fact. Causal thinking inverts this. It asks you to define what “working” means before you deploy — which users would you compare this cohort against, and what outcome would genuinely constitute lift?

That question forces clarity on something most engagement teams quietly avoid: the counterfactual. What would have happened if you hadn’t sent anything? In high-frequency markets like Indonesia and Thailand, where users receive dozens of branded communications daily, the counterfactual isn’t “they do nothing.” It’s “they receive a competitor’s message instead.” Building that reality into your measurement framework changes how you think about cadence, timing, and channel selection entirely.


Key Takeaways

  • Propensity score matching identifies statistical twins in your user base to isolate true causal lift — apply it before scaling any CEP intervention that targets high-intent cohorts.
  • Data observability at the feature store level is non-negotiable: a silent pipeline break upstream invalidates every matching model downstream.
  • Build market-stratified propensity layers for Southeast Asian deployments — behavioural intent signals vary meaningfully across platforms and countries, and a single regional model will mask that variance.

The platforms will keep getting faster at activation. The question worth sitting with is whether speed in the wrong direction is still progress — or just a more efficient way to optimise something that was never working in the first place.


At grzzly, we work with growth teams across Southeast Asia to build CEP frameworks that connect data architecture to real engagement outcomes — not just activity metrics. If your measurement layer isn’t keeping pace with your activation capability, that’s exactly the conversation we enjoy having. Let’s talk

Brooding Grizzly

Written by

Brooding Grizzly

Designing CEP frameworks that move beyond batch-and-blast into real-time, context-aware engagement — across channels, devices, and the messiness of actual human behaviour.

Enjoyed this?
Let's talk.

Start a conversation