Peter Coldicott, February 3, 2023
- What is a Data Product?
Today, many organizations lack a thoughtful data cataloging strategy that prioritizes the needs of the consumers. The question is why is data being cataloged – is it due to a governance mandate? Or to meet the analytics needs of the owning business unit? Or for developing products that may be more broadly useful?
Generally, the format of the cataloged data is in the form that is most convenient to the source teams. There is little thought on how it is to be consumed since this is unknown. Consumers are expected to sift through what is available, choose what they need and adapt it to their use cases. This pushes cost onto the consumers and leaves them with no guarantees that the data they have chosen and invested in, will continued to be delivered in its current form. There is rarely any contract between the source teams and the consumers.
A personal note here, a lot of things that you read about the business of data products seem to favor their existence as a way to create a new market, selling your data (products) to other companies. While there are clearly companies that make a living doing this, I think it is at best naïve to think that any company in the world can suddenly create an entire new revenue stream this way.
I believe that at least right now, the act of curating your data and organizing it into data products in a thoughtful way will be more valuable inside the corporation. Many companies today, I am tempted to say all, don’t know what data they have, and few have any kind of meaningful metadata about their data. The journey from that state to having a well-managed suite of data products is hard and requires investment, but it is my contention that it is well worth it.
- Why do I/We need them
There is already a lot of debate about how the data a corporation collects and manages, represents a huge amount of value. Investment in Data Scientists and other related roles is ever increasing, and there is a lot of competition for scarce resources to fill these roles. This is true even in the current recessionary climate for the IT industry.
To then take these scarce and expensive people and force them to spend the majority of their time on the more mundane tasks of evaluating and reformatting data is just a bad investment for the company. The lack of guarantees associated with the data can make teams reluctant to invest in its use.
It is ever more complicated to make good business decisions without data. Business controls are ever tighter as is the regulatory landscape that we all operate in. Having good data behind any decision reduces risk, helps ensure compliance and general leads to better decision making.
Note, this is not to say that managers shouldn’t take risks, not at all, taking some level of risks is essential to running a successful venture, but informed risk-taking outperforms uninformed risk-taking every day of the week.
- Pros and Cons of Data Products
This metadata makes it much easier for the consumers (either internal or external) to find what they want and to have confidence that this data is appropriate for their intended use – but it needs investment to gather and curate. Data products also need dedicated staff to develop the road map of the data products and ensure they continue to meet the needs of their consumers.
- Egeria is an open source project and can play a significant role with enabling the development and use of Data Products
The Egeria software consists of a set of configurable and extensible services that help collect, manage, distribute and use metadata. For the purposes of this paper specifically this is the metadata associated with Data Products. It also supports the integration of different tools, such as those used by data scientists and analysts, and supports these users to more easily find, subscribe to, and use appropriate data sources in their work.
Egeria also supports a multitude of data sources, that is files, databases, data lakes, data warehouses, lake houses, spreadsheets and so on. It allows the data sources and their owners to maintain local control of security. That is not only access control, but also a granular level of control over who can see PII, or other types of controlled elements within the data, decide what are the terms and conditions of access, service level agreements, costs, frequency of refresh, etc. There are different implementation choices but in general Egeria would hold the classifications on the assets, that would then be used by enforcement mechanisms (either operating in Egeria or not).
Execution of the policies would happen close to the runtimes that hold or provide access to the data.
While there is no such thing as a silver bullet in this problem space, Egeria does provide a level of integration and automation that is unrivalled in my experience.
- Data Products vis-à-vis other types of IT Products
A word of caution out of my extensive experience in the IT world, Data Products are all the rage at the time of writing, as with other advances in both thought and capability, they are not a silver bullet. It is important to ignore the hype and get a true assessment of the potential value of creating any product, and data products are no exception. I am not talking about an accounting exercise attempting (without success) to predict business value down to the last dollar and cent. No, what is required is a reasonable estimate of probable value such that the priority of this data product can be compared with other projects of whatever type to determine a set of priorities and a plan to realize those values. Just because it is currently in fashion, doesn’t mean this data product project is more important than, for example, a project to produce a new widget. Value and priorities need to be set at a high enough level, with a consistent level of oversight, to allow decisions that make sense for the organization over the long term.
- A Pragmatic Approach
There is an excellent article by Sven Balnojan, published in Towards Data Science that talks about estimating the value of products and specifically data products.
I would certainly recommend reading this truly pragmatic approach to determining value at a level that will allow comparison with other potential projects.
This appeals to me. Of course Pragmatic is a part of our company name for good reason. Sven’s article doesn’t dive down into the weeds, and it promotes a kind of “eyes wide open” approach.
At Pragmatic Data Research Ltd, we do our best to help our clients be pragmatic and make meaningful progress while at the same time becoming self-sufficient. That last piece is important. When one of our client’s becomes self-sufficient then we view that as success.
We offer not only deep expertise but also education, mentoring and hands on support with that self-sufficiency goal in mind.