
A Data Lens is a specification of the data needed to support an AI development project. It reflects the scope of the business problem/opportunity laid out by the sponsors, but has sufficient detail to act as a filter when selecting and organizing potential data sources for the project.
The Data Lens identifies the subject areas (also known as the data domains) that the data must cover. For example, does the project need financial data? Or customer activity data? Or engineering design data? These are all types of subject areas.
Subject areas can start out pretty informal. A simple analysis to clarify the meaning of each of your subject areas, possibly identifying nested and related subject areas, may help to focus the data lens and save time. It also acts as a way of organizing the data sources for the project. This link takes you to a subject area definition exercise performed by Coco Pharmaceuticals to give you an idea of what is involved. They are embarking on a major business transformation, that will completely change their data needs. The key take-away from this scenario is to avoid focussing on the data you have, and think about the data you need to make the project successful. It may be that data must come from external sources to bootstrap the new way of working under development.
This is the result. It begins to create a vocabulary for discussing data requirements with potential data suppliers.
Subject areas describe the semantic content of the data needed for the project. This is only the first dimension of the data lens. The data must also match other scoping aspects from the project requirements, such as:
- Temporal scope (“… we are only interested in data from the last fiscal year …”)
- Geospatial scope (“… we are only going to focus on European operations … “)
- Organizational scope (“… we need to understand how chemicals are used by the research team; we can look at the manufacturing team’s approach at a later stage …”)
- Quality scope (“… the potential impact on product safety means that only accredited procedures should be included …).
Finally, there is a legal aspect to the data lens, that defines an regulatory or licensing restrictions that much be considered.
With the data lens in place, it acts as a filter, identifying which data sources should be included in the project and can help organize them for the convenience of the project team. The focus of the data lens may change as the project progresses, starting narrow and broadening its gaze as the results demonstrate value.
This is blog 4 on the Data Readiness series.