Measurement for AI Decision Systems

Haipeng Zheng, Xiangyu Wang, Nitin Mangal, Jian Wang, Yuchen Wu

Most teams building AI decision systems are likely to hit a fundamental limit with traditional attribution. Decision systems need a measurement layer that can answer what the system should do next, but traditional attribution was designed to answer who got credit for past spend.

This is not because the existing measurement solutions are incorrect. Individual platforms actually report accurately within their own ecosystems. Third party tools also correctly aggregate data and produce complete dashboards. On the surface, the data and signals are all complete.

The problem is deeper than data quality. It is about the gap between what we need to build a decision system and what traditional attribution can provide.

Why Measuring Who Got Credit Is Not Enough for Decision Making

The natural starting point when building a decision system is measurement. But improving measurement quality alone does not produce better decisions - because measurement and decision making are architecturally different problems.

Individual platforms can only see what is visible in their own ecosystems, and they are also incentivized to present that as the full picture. Meta reports on its own contributions. Google reports on Google's. The result is a measurement landscape where nobody owns the complete picture.

Most third-party tools rely on server-side or client-side pixel recording. While these systems aggregate data, they primarily use simple, rule-based methods (e.g., last click, first-click, linear) to slice up revenue into different channels/campaigns. The current attribution systems mostly answer a simple question: who gets credit for the outcome? They are reporting tools built to provide an accurate summary of what happened.

In addition, most attribution systems only analyze the journeys that converted, but the signals around what did not work get lost. The non-converting paths are where you truly learn which channels create genuine demand versus which ones simply show up at the end.

Google Analytics GA4 platform is a holistic user behavior and event analytics platform and goes beyond this. It has no structural incentive to justify ad spend or claim attribution credit. This allows measuring user behavior across the whole journey without platform preference. Our measurement platforms were telling a complete story - the story wasn't necessarily true though. Comparing the GA4 behavioral data to the individual platform reports, the gap was larger than we expected.

What We Found When We Looked at the Full Picture

Connecting Meta, Google Ads, SMS, and GA4 into a single view of the full user journey revealed a different story.

Analyzing both converting and non-converting paths showed that the budget consistently flowed towards channels closing the outcome, starving the openers. Facebook and Instagram, essential for starting conversations weeks before the final purchase, were getting no credit since SMS and Email were closing the sales. The discovery channels were not getting credit for initiating the process.

It became apparent when we validated via GA4. The user journey, as it unfolded across sessions over weeks, told a different story from what each platform claimed. The key insight wasn't that last-touch attribution was imperfect – it was that the gap between reported attribution and true incremental contribution was consistent, consistently favoring what was close, trackable, and legible over what drove the outcome.

Better Data Is Not Enough

And here is the thing - even though GA4 gave us a neutral view, we still had a reporting problem, not a decision problem. GA4 told us what happened, not what to do next.

Hence, we built a model that simulates the full user journey - allowing us to ask what-if questions that would be too costly to test directly. What happens to total conversions if we shift the budget from email to Facebook? If we reduce SMS spend significantly, how much demand actually disappears? Rather than running a costly experiment for every budget decision, the model generates a prediction - and then real-world results either confirm or correct it.

That feedback loop is what makes the system learn. And it creates a compounding effect in both directions. Better data leads to better decisions, which generate more relevant and higher quality data for future learning. But the reverse is equally true - poor data leads to flawed decisions, which perpetuate a cycle of exploring only a limited and poor quality data space. The system gets more confident and more wrong simultaneously.

The missing layer was not better data. It is a system that connects what the insights reveal to what actually happens next.

The Broader Implication for AI Decision Systems

This is also where the attribution problem can be generalized.

Any AI system is only as good as the signal that it learns from. If the system is fed signals that are from sources that have incentives to present a biased view, the system will learn the wrong thing, and unfortunately with increasing confidence over time. The measurement layer cannot just be reporting, it has to be the foundation that determines what the system believes is true, and therefore what it decides to do.

This is the same problem we see in AI evaluation more broadly. When you measure an agent's performance based on what is easy to track - task completion rates, approval clicks, short term metric movement - you end up with a system that gets better at producing those signals rather than getting better at the actual outcome. The measurement bias problem in attribution and the evaluation bias problem in AI are the same problem at different layers.

From Attribution to Decision System

This is how we look at attribution differently at MAI. We are not redefining attribution - we are building a decision layer on top of it.

We are working towards building a layer between measurement and action that allows us to drive better decisions and continuously improve on them. In practice, that means a shift from backward-looking explanatory attribution - who got credit - to forward-looking predictive incrementality - who drove the outcome and what should be the next step. It means connecting complete behavioral data from GA4 and individual platforms into a unified view that provides a complete picture of the true drivers. And it means measuring success not on how accurate and fair the reports are, but on whether the connected decisions truly produce better outcomes over time.

This is what we have been building toward at MAI - moving attribution from passive reporting into an active decision system that acts on what it learns, continuously, without waiting for a human to make sense of multiple dashboards and decide what to do next.

The key question we are asking is not "is the measurement correct?" but instead "is our measurement leading to verifiable decisions that produce high quality outcomes over time?"

That's a harder question. It's also the right one.