Scale your data team’s output by up to 100x. We'd love to prove it.

Challenge Maia at Snowflake Summit

Agents of Data: Digging into Semantic Layers

Semantic layers have quietly powered business intelligence tools for years. Now, as agentic AI systems emerge, they're becoming critical infrastructure for enterprises looking to deploy AI agents effectively.

In this episode of the Agents of Data Podcast, Frank Weigel (Chief Product Officer at Matillion), Julian Wiffen (Chief of AI & Data Science), and Sam Perrin (Senior Staff Software Engineer, AI) explore what semantic layers are, why they're essential for agentic systems, and how organizations can build them without drowning in complexity.

Watch the full episode here. 

What is a Semantic Layer?

At its core, a semantic layer bridges the gap between natural language and technical systems. When someone asks for "revenue numbers," a semantic layer knows exactly what that means for your specific company: which tables to query, how revenue is calculated, and which products or business units are included.

There's a fundamental issue when it comes to LLMs and business problems. Users talk in natural language and use terms specific to their company. An agentic AI needs to understand what people are talking about at a very high level of specificity. Frank Weigel Chief Product Officer| Matillion

The semantic layer operates at two levels:

  • Business language mapping: Understanding company-specific terminology. When someone mentions "magical widget A," the system knows it's a product name. When finance asks about revenue, it applies the company's specific calculation rules.
  • Technical metadata: Connecting business concepts to actual data structures. In a data warehouse with 10 product tables and 8 revenue columns, the semantic layer knows which ones to use, why they exist, and when they're appropriate.

The Hidden Knowledge Problem

Most organizations already face this challenge with human employees. New team members need to learn:=

  • Which tables are production versus experimental
  • Why there are multiple versions of similar data
  • What business rules apply to calculations
  • Who to ask when something's unclear

This knowledge typically lives in three places:

  • People's heads: The veteran team members who remember every quirk
  • Scattered documentation: Wikis created years ago that may be outdated
  • System metadata: Automatically captured information that lacks business context

As Julian points out, sometimes it's wikis, sometimes it was a project somebody did two or three years ago, and it's the best you've got because no one's touched it since.

For AI agents, this fragmentation is a critical barrier. Unlike humans who can ask Bob down the hall for clarification, agents only know what they can see.

Why Agentic AI Needs Semantic Layers

When you ask Maia, Matillion's agentic data team, to build a sales report, it needs to:

  • Identify relevant tables among hundreds or thousands of options
  • Understand which revenue definition applies
  • Know which product hierarchy to use
  • Apply appropriate business rules and filters

Without a semantic layer, the agent either makes dangerous assumptions or needs every piece of documentation included in every prompt, which is slow, expensive, and introduces noise.

If I'm asking Maia to build a new sales report or customer view, it absolutely needs to find out what tables are there. That's fine in a toy database with 20 tables, but with 200 or 2,000 that becomes a challenge. Julian Wiffen Chief of AI & Data Science| Matillion

The semantic layer provides a searchable, queryable repository that agents can consult to find exactly the information they need, when they need it.

Building a Semantic Layer: The Graph Database Approach

Traditional data catalogs have attempted to solve this problem with limited success. They're often incomplete, outdated, or too rigid to capture the messy reality of enterprise data.

The team at Matillion is exploring graph databases as a more flexible foundation:

  • Nodes represent facts: Tables, columns, business definitions, decisions made, assumptions documented
  • Edges represent relationships: Not just hierarchical (parent-child) but semantic (similar to, derived from, supersedes)
  • Attributes add context: Date stamps for recency, confidence levels, ownership information

This structure allows agents to traverse relationships naturally. When analyzing a sales database, an agent can walk from account to opportunity to lead to contact, understanding how these entities relate historically, even when direct foreign keys don't exist.

The Staleness Challenge

One of the biggest obstacles to maintaining semantic layers is keeping information current. Schema changes, business rule updates, and organizational shifts constantly threaten to make documentation obsolete.

But LLMs offer a unique advantage here. Unlike traditional systems that break when information is slightly wrong or incomplete, LLMs can work with:

  • Incomplete information (and ask for what's missing)
  • Conflicting sources (and reason about which to trust)
  • Varying formats (structured, unstructured, semi-structured)
If I'm asking Maia to build a new sales report or customer view, it absolutely needs to find out what tables are there. That's fine in a toy database with 20 tables, but with 200 or 2,000 that becomes a challenge. Frank Weigel Chief Product Officer| Matillion

Additionally, as agents use the semantic layer more frequently, data quality naturally improves. When downstream processes depend on accurate metadata, teams have stronger incentive to maintain it, just as databases with frequent queries tend to be cleaner than rarely-accessed ones.

Roles Are Shifting

As agentic systems take over more pipeline-building and transformation work, data professionals' responsibilities are evolving:

  • Less time: Writing individual transformations, debugging failed pipelines, manually mapping fields
  • More time: Curating semantic layers, validating agent outputs, defining business rules and governance

This mirrors patterns in other domains where generative AI has been deployed. Customer support teams spend less time answering tickets and more time maintaining knowledge bases. The knowledge layer becomes valuable not just for humans but for the AI systems that need to serve them.

The Validation Opportunity

Semantic layers also enable better quality control. When an agent builds a pipeline, it can:

  • Compare outputs against expected types (emails, postcodes, amounts)
  • Test hypotheses by joining tables and checking result volumes
  • Validate that business logic produces sensible results

If an agent expects email addresses but gets phone numbers, the semantic understanding flags the mismatch. This kind of semantic validation goes far beyond traditional schema checks.

The Ownership Question

An unresolved tension is emerging: who will own the semantic layer?

Will each tool (BI platforms, ETL systems, AI agents) maintain its own? Will organizations centralize everything in a single repository? Or will we see a hybrid approach with departmental layers federated together?

Given how important semantics are for agents working well, it might become a bit of a battleground. Whoever owns the best layer that understands most about the business will be able to have the best working agents. Frank Weigel Chief Product Officer| Matillion

The answer likely varies by organization size, structure, and data maturity. Graph databases may help by allowing different domains to maintain their own nodes while connecting them through shared relationships, but political and organizational dynamics will play as much a role as technology.

Getting Started

For organizations looking to build semantic layers that support agentic AI:

  • Start with existing metadata: Database schemas, table descriptions, lineage information
  • Layer in business context: Ownership, business rules, known quirks and exceptions
  • Make it queryable: Agents need to search and filter, not read everything
  • Allow unstructured input: LLMs can extract value from notes, wikis, and conversation history
  • Build feedback loops: When agents make mistakes, capture corrections back into the layer
  • Accept imperfection: Partial, somewhat-stale information is better than nothing

The shift toward agentic AI makes semantic layers a strategic asset. Organizations that invest in capturing, organizing, and maintaining this knowledge will have agents that work reliably, while those without will struggle with hallucinations, errors, and constant human intervention.

As data teams spend less time building pipelines and more time curating knowledge, the semantic layer becomes the new frontier of data engineering productivity.

Agentic AI represents a fundamental shift: from AI as a tool, to AI as a decision-maker within complex systems. 

For data leaders, the opportunity is clear: reduce bottlenecks, cut costs, and make data engineering a strategic advantage.

Looking to catch the full episode?

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.