The Egeria Advisor – Sharing the Journey – Pragmatic Data Research Ltd

Dan Wolfson

We have to build systems that people want to use, that provide value to them individually, that makes their work life better – not just view humans as knowledge sources to train the corporate machines. Its about building balanced ecosystems that provide benefits to users, trust in the systems, business value to the organization.

At Pragmatic Data Research and the Egeria Project that we support, we’ve been experimenting with the concept of Context Intelligence Context Intelligence is about leveraging all kinds of context to make more useful AI applications. Using information about data ownership and sovereignty to choose what information should be training our LLMs – or ingested into RAG systems. Recognizing that different organizations may have different perspectives, different questions, different terminology – that to understand user requests and answer in the most useful way we need to proactively let the user tell us their perspective, what they are trying to achieve. Context intelligence can be injected throughout the AI application lifecycle – we have a lot of ideas about how it could be used – now we need to test out these ideas, to experiment, to build. We need to learn what works for us and for others – and what doesn’t. This is why I’m building Egeria Advisor. To experiment, to pragmatically build and evaluate – and hopefully to provide some useful tooling for the Egeria community, that integrates with their existing IT and data estate. And increases the value of your existing tools.

Big, perhaps somewhat naive objectives – but, I hope, worthwhile. I’ve been building the Egeria Advisor for about six months and realized it was time to document and reflect on what’s been built, some lessons learned, and where we’re going. I say we, because I hope that others will be interested and perhaps contribute their thoughts, ideas, and code. I expect this to be a somewhat intermittent blog as the Egeria Advisor is not my only project – nor the only interesting one ;). So let’s start at the beginning.

As with many of you, the siren call of the latest AI technologies – of LLMs, RAG, Agents and MCP – had been nagging at me for the past couple years. As an Egeria leader and committer, I had been concurrently watching and learning about the AI evolution, fads, and trends while we continued to extend Egeria’s core capabilities. My first baby-step into the modern AI world (I was a research scientist in the AI Department of a corporate lab in the 80’s), was to build a simple MCP interface to the Pyegeria Report Spec feature. Report Specs are a topic of their own, but suffice it to say that there are more than a hundred pre-defined reports on different facets of Egeria and user’s can dynamically create their own. So adding an MCP interface allowed me to use a tool like Claude or Gemini as a natural language front-end. With an AI interface, a user could find, execute, visualize and explain reports. It was a good first step – and it led to a number of questions and insights. Fundamentally, just having a big list of reports isn’t particularly useful in itself. How does a user find a relevant report? What makes a report relevant for a user? When might we want to dynamically tailor the report for the user? We need ways to organize the available reports and ways to suggest or guess what the user might want or need..this was some of the early thinking behind the theme of Contextual Intelligence.

A foundational innovation that we developed in March of 2025 (and of course we continue to evolve) is called Dr. Egeria MD. Perhaps a little cutsy play on words, but hey, we try to have some fun. Dr. Egeria, is a simple markdown language that allows you to use Egeria by just entering text in your favorite tool. The syntax if very simple. A command starts with ##, and the commands attributes with ### (note that the syntax changed a bit from the first implementation to make larger documents easier). Dr.Egeria commands can be intermixed with other text. So you could be writing up a design spec for, lets say a data structure, and intermix a narrative description of the data structure with Dr.Egeria commands that define that data structure to Egeria. If you want to execute the document. You can run a command to validate and/or execute it. Execution results in an output markdown document that preserves all of the narrative text and fills out generated and default values that Egeria creates. Its designed so that you can take the output document, update some of the Dr.Egeria commands and re-execute. Each time the document is executed it appends to the provenance of the document. So you end up with a record of what’s been done. There is also a Dr.Egeria command to run a report_spec and generate the result in your choice of markdown list, standard markdown, mermaid graphs, etc. This means that you can build a Dr.Egeria document that, for instance describes the need for the new data structure, defines the data structure to Egeria and then prints out a mermaid picture of the data structure you’ve created. I think of it as a form of self-documenting, narrative, data management. Because Dr Egeria supports a textual narrative style of interaction it is conveniently (presciently?) suitable for working with AI. The reason I mention it here is simple: Foreshadowing!

So last year, I had the first foray into modern AI with the report_spec MCP commands. Useful, but limited. I needed to start learning about the LLMs and RAG – Retrieval Augmented Generation – techniques. As most of you know, the basic idea behind RAG is that you dump some documents into an AI tool which ingests the documents into some kind of database (often a vector store like Milvus) which you can then query through text prompts. There are off the shelf tools (ie – no coding) tools to do this – for example LM Studio – and this is a simple way to start. Which is what I did. I fed these tools with Egeria’s documentation (~1000 pages) and started asking questions….which is when you start to learn about hallucinations, temperature, chunk size, chunk overlap and a whole new and evolving set of parameters and terminology. I was astounded by the creativity of responses – I’m sure you can imagine.

I realized that this approach was not going to meet my expectations. I could start using one of the commercial LLMs – after all they are really, really good at this. But it was never going to integrate with an open-source project like Egeria. And, of course there are privacy concerns. So I was going to have to build something new. And that lead to my first attempt – Egeria Expert in December of 2025 – which we will cover in the next installment.

Feedback welcome!