Skip to main content
The Map Is Not the Territory — But Maybe the Ontology Is
  1. Posts/

The Map Is Not the Territory — But Maybe the Ontology Is

·2079 words·10 mins
Table of Contents

Here’s a confession: I spent nearly three decades building dimensional models, and I thought I understood what a data model was for.

I thought it was for organizing data. Making queries fast. Giving analysts a clean surface to write DAX against. And that’s not wrong, it’s just profoundly incomplete. Because dimensional models describe how we store facts. They don’t describe what those facts mean.

That distinction took me embarrassingly long to fully appreciate.


What an Ontology Actually Is
#

The word “ontology” comes from philosophy, specifically from the Greek ontos (being) and logos (study). It’s the branch of philosophy concerned with what exists, and what the fundamental categories of existence are. What is a thing? What makes it the kind of thing it is? How do things relate to each other?

When computer scientists borrowed the term in the 1980s and 90s, they needed a precise way to describe formal, machine-readable representations of knowledge in a domain. The definition that stuck — and it’s become something of a standard — comes from Thomas Gruber’s 1993 paper: “an ontology is a specification of a conceptualization.” [1]

Let me unpack that, because it matters.

A conceptualization is the way you think about a domain: the objects in it, their properties, and the relationships between them. A specification is a formal, explicit description of that. So an ontology tells you not just what things are called, but what they are, how they’re defined, and how they relate to everything else.


The Dimensional Model: What It Does Well
#

Ralph Kimball’s dimensional modeling methodology is one of the most enduring ideas in the data field. The star schema, a central fact table surrounded by dimension tables, is elegant, performant, and (when done well) genuinely comprehensible to business users. Dozens of patterns codified in The Data Warehouse Toolkit have been applied, with variations, in nearly every major analytics implementation built over the last thirty years. [2]

And for good reason. Dimensional models are optimized for query patterns. They answer a specific class of question extraordinarily well: “how much of X happened, by which Y, in which time period?” Revenue by region by quarter. Support tickets by product by priority. Claims by diagnosis by payer.

The design is deliberately query-centric. Fact tables store measurements: what happened, how much, how many. Dimension tables provide context: who, what, where, when, why. Denormalization is intentional: redundancy is traded for performance and simplicity. Business users can navigate a star schema in a BI tool and build reports without understanding JOIN logic.

This is genuinely good engineering. Don’t dismiss it.

Sitting above the dimensional model, most modern BI stacks include a semantic model. In Power BI, this is not a separate layer bolted on top: Power BI’s engine is the Tabular model, the same in-memory columnar engine that underlies SQL Server Analysis Services. When you publish a Power BI dataset, you are publishing a Tabular semantic model. This is where measures get defined, hierarchies get organized, and business logic gets encoded in DAX. The semantic model abstracts away the physical storage details and gives business users a vocabulary they can actually work with. It knows that [Net Revenue] is the sum of [Gross Revenue] minus [Returns], filtered to exclude intercompany transactions. That’s a meaningful step up from raw tables.

But notice what the semantic model is: it’s a curated projection of the dimensional model, with business logic layered on top. It doesn’t define what revenue is in any formal sense. It defines how to calculate it, given the data you have and the questions you anticipated when you designed the model.

Here’s the thing: a dimensional model — and by extension, the semantic model built on top of it — describes data in the shape of the answers it was designed to give. Both are views of reality after the questions have already been decided.


Where Dimensional Models Hit Their Ceiling
#

Imagine you’re building a model for a healthcare organization. You have patients, encounters, diagnoses, providers, facilities, insurance plans. You build your dimension tables, design your fact tables, and everything works beautifully — until someone asks a question the model wasn’t designed for.

“Which patients have been treated by providers who have financial relationships with pharmaceutical companies that manufacture drugs those patients are currently prescribed?”

That’s not a star schema query. That’s a graph traversal across a semantic network. And the harder you try to shoe-horn it into a dimensional model, the more joins, bridge tables, and workarounds you accumulate — until the “elegant” star looks more like a tangle of overloaded dimensions and degenerate fact tables.

This isn’t a failure of Kimball’s methodology. It’s a category error. Dimensional models are optimized for measurement aggregation. They’re not built to represent relationships between concepts.

There are a few other places dimensional models struggle:

Evolving business definitions. What does “active customer” mean? In dimensional modeling, that definition gets embedded in ETL logic or measure expressions. When the definition changes, and it always changes, you have to find every place that assumption lives and update it. There’s no single, authoritative place where “active customer” is defined and reasoned about.

Multiple inheritance and classification. A product can be a pharmaceutical, a medical device, and a controlled substance simultaneously. In a dimension table, that’s a design problem. In an ontology, it’s just a fact.

Cross-domain integration. Two business units may both have a “customer” concept, but define it differently. Resolving that conflict in a dimensional model requires either a heavyweight MDM project or an uncomfortable conversation about whose definition wins. Ontologies can represent both definitions, relate them formally, and let downstream consumers use either — or reason across both.

Inference. A dimensional model can tell you that a person attended a specific clinic. It cannot infer that, because that clinic is a teaching hospital affiliated with a specific university medical center, that person’s treatment record may be relevant to a research exemption under GDPR Article 89. That inference requires a formal understanding of what those relationships mean.


Ontologies in Business Intelligence
#

The application of ontologies in BI has been an active research area since at least the early 2000s, and has had a distinctly uneven practical adoption. The academic literature is rich. The number of production implementations that would pass a journalist’s scrutiny is considerably smaller.

There are a few reasons for that gap, and they’re worth being honest about.

The tooling has historically been specialized and unfamiliar. Protégé, the most widely used ontology editor, has a learning curve that assumes comfort with description logic. OWL and RDF are expressive but verbose. SPARQL, the query language for RDF data, is powerful but syntactically alien to analysts trained on SQL. These aren’t insurmountable barriers, but they’re real ones.

The organizational challenges are worse. Building an ontology means convening domain experts and forcing them to reach agreement on formal definitions. That is, in my experience, one of the hardest things you can ask a business to do. People will fight for hours about whether “revenue” includes or excludes intercompany transactions. Getting that fight documented in OWL is valuable precisely because it forces resolution, but that means someone has to want the fight badly enough to have it.

And yet.

The case for ontological approaches in BI is strengthening, for a few converging reasons.

Knowledge graphs, which typically use ontologies as their schema layer, have moved from academic curiosity to production infrastructure. The BBC uses an ontology to manage the relationships between programmes, contributors, brands, and series in a way that would be genuinely unwieldy in a relational model. [3]

The rise of large language models has added another dimension to this. LLMs are remarkably good at extracting entities and relationships from unstructured text. Ontologies provide the formal structure to integrate those extracted facts into a coherent, queryable knowledge base, and to constrain what the LLM is allowed to assert. That combination is increasingly interesting to organizations dealing with large volumes of documents, contracts, and communications alongside their structured transaction data.


Not Either/Or — But Eyes Open
#

I’m not arguing that dimensional models should be replaced. That would be a bad argument.

Dimensional models remain the right tool for the core BI use case: fast, flexible aggregation of transactional data for reporting and dashboards. When a finance team needs to slice revenue by business unit, product line, and month, and they need it in two seconds in Power BI, a well-designed star schema in a columnar store is hard to beat.

Ontologies shine where the questions are harder to anticipate, where relationships between concepts matter more than measurement aggregation, where definitions need to be governed formally rather than embedded silently in ETL, and where inference across complex domain knowledge adds value that queries alone cannot provide.

The interesting question, and the one I’m increasingly thinking about in the context of what we’re building with Microsoft Fabric, is how these layers actually relate to each other. And I think most people have this backwards.

The ontology is the foundational layer. It defines what business concepts are, how they relate, and what constraints hold between them. The semantic model is a governed projection of that: it takes ontological concepts and expresses them in terms that BI tools can query and aggregate. The dimensional model is a further projection: a performance-optimized physical representation of a subset of what the semantic model describes.

Each layer is derived from the one above it. Each trades expressiveness for something practical: queryability, performance, tooling compatibility.

This matters, because it means the ontology should logically come first. You define your business reality at the ontological level, project it into a semantic model for your BI consumers, and project that into star schemas for the storage and aggregation engine.

Microsoft is making a direct bet that this gap matters. Fabric IQ, currently in preview, is a workload inside Microsoft Fabric that introduces a formal ontology layer: entity types, relationships, properties, and business rules, bound to live data across OneLake. [4] The design is explicitly about giving AI agents something to reason with, not just data to retrieve. A Copilot or agent with access to your star schema can answer questions about what the numbers are. A Copilot grounded in a Fabric IQ ontology can answer questions about what they mean, and trace the relationships between entities across domains that a dimensional model would never connect.

One capability worth noting: Fabric IQ can bootstrap an ontology from an existing Power BI semantic model. [5] That’s a pragmatic acknowledgment that most organizations have years of implicit ontological thinking already embedded in their BI layer. As an on-ramp, it’s smart product design.

But there’s a tension in that approach that’s worth being honest about. If you build your ontology upward from a semantic model, you get the ontology your BI layer implies — constrained by the assumptions, the question patterns, and the design decisions already baked into it. You don’t necessarily get the ontology your business requires. The concepts that never made it into a report, the relationships that were too complex to model in DAX, the business rules that live only in people’s heads — none of those appear in a bootstrapped ontology.

That doesn’t make the bootstrapping approach wrong. It makes it a starting point, not a destination. I’ll go into considerably more depth on Fabric IQ — its architecture, the ontology item, and what it means for how we design data platforms — in a follow-up post. For now, the point is that ontological approaches in enterprise BI are no longer purely academic territory. They’re shipping in the platform you’re probably already using.

Closing that gap — formally, machine-readably, in a way that survives personnel changes and organizational restructuring — is the problem ontologies are actually built to solve.

And it turns out someone at Microsoft agrees.


Join the Conversation
#

What is your take on semantic models and ontologies? I’d love to hear about your go-to method for modelling business data and business processes! Reach out on LinkedIn or BlueSky.


[1] Gruber, T.R. (1993). “A translation approach to portable ontology specifications.” Knowledge Acquisition, 5(2): 199–220. https://tomgruber.org/writing/ontolingua-kaj-1993.pdf

[2] Kimball, R. & Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. 3rd ed. Wiley. https://www.wiley.com/en-us/The+Data+Warehouse+Toolkit:+The+Definitive+Guide+to+Dimensional+Modeling,+3rd+Edition-p-9781118530801

[3] Noy, N. et al. (2019). “Industry-Scale Knowledge Graphs: Lessons and Challenges.” ACM Queue, 17(2). https://queue.acm.org/detail.cfm?id=3332266

[4] Microsoft Learn: What is Fabric IQ? https://learn.microsoft.com/en-us/fabric/iq/overview

[5] Microsoft Learn: What is Ontology (Preview)? https://learn.microsoft.com/en-us/fabric/iq/ontology/overview


Photo by Pixabay: https://www.pexels.com/photo/black-framed-eyeglasses-on-book-159743/