Skip to content

Data Preparation in the AI Journey (Part 2)

In my previous post, I outlined five key pillars: Scope, Compliance, Trustworthiness, Understandability, and Cost.

While these pillars provide a design framework, moving an AI application from an interesting experiment to a production-grade tool is not a linear journey. It is a fundamentally iterative process of continuous learning and introspection. To mature the system, the team must move beyond technical metrics and embrace a cycle of questioning:

  • Is the data suitable for the purpose? Does it cover the intended scope, or are we missing the “connective tissue” that makes the AI’s logic coherent?
  • Do we have too much or too little? In early experiments, we often over-provision. Maturation involves trimming the excess to reduce costs and latency, or identifying gaps where the AI lacks the depth to be useful.
  • What is the user’s perception? Technical Truth (integrity and quality of the data) often differs from Business Truth—the nuanced combination of revenue, sales coverage, and pipeline metrics that a leader actually uses to make a decision.

🔄 The Collaboration Loop

Maturing an AI system requires a dialogue and exchange between distinct viewpoints, each focusing their expertise on different aspects of the system’s life cycle:

  • Data Engineers build the environment to transform and deliver data, focusing on the infrastructure and the “Technical Truth.”
  • Data Scientists validate the usefulness and the outcome metrics, ensuring the model’s grounding is firm and the input data consistently yields the desired business results.
  • Data Owners & Governance guard the “should,” ensuring compliance, privacy, and long-term value are maintained throughout the iteration.
  • Business Specialists evaluate relevance and correctness—providing the “Business Truth” and human validation that technical metrics alone cannot capture.

📊 Case Study: Scaling the Sales Forecasting Advisor

The role of data preparation in our Sales Forecasting example illustrates these challenges concretely. When an organization grows through acquisition, normalization is the primary hurdle.

  • Aligning Temporal Scope: Using Context Intelligence, we identified that North American systems provided daily updates, while EU systems operated on a weekly cycle. To maintain understandability and consistency, we normalized the RAG vector store to a weekly cadence. This prevents “hallucinations of discrepancy” caused by misaligned timestamps.
  • Tool Configuration (MCP): Using the Model Context Protocol, we can configure specific tools for specific tasks. We discovered real-time queries were available for most systems, but not Europe—creating a clear requirement to either renegotiate user expectations or bridge the infrastructure gap.
  • The Evolution of Suitability: Business evolves in less predictable ways than code. We found that allowing users to query raw, “live” sales data was too volatile for accurate forecasting. We intentionally restricted the data granularity to weekly and monthly summaries. This wasn’t about finding “perfect” data; it was about adjusting thesuitability to match the business goal of reliable, stable forecasting.

🔭 Looking Ahead

Throughout this journey, we have touched on technical, business, and human-oriented feedback. Measuring quality metrics and update frequencies is only the foundation; these become the inputs for Instrumentation.

Instrumentation is the bridge that connects technical performance to business outcomes. It ensures we can answer:

  • Technical: Is the system performing within cost and latency parameters?
  • Business: Are we catching data quality issues before they affect the user experience?
  • Human: Are the users achieving the success criteria the project was chartered with?

In our next blog, we’ll explore how Context Intelligence supports designing monitoring that connects these layers—completing the loop from strategy to operational reality.

Leave a Reply

Your email address will not be published. Required fields are marked *