Bringing Agentic AI to Real-Time Data Monitoring: My Internship at Matillion

At Matillion, we’re exploring how AI can make data work faster, smarter, and more intuitive. This summer, I had the incredible opportunity to work with Matillion's Developer Experience team on a project to use artificial intelligence to streamline real-time data monitoring. Over the last 11 weeks, I’ve designed, implemented, and deployed ObServant, an internal tool that allows Matillion engineers to query monitoring systems for important metrics, logs, and data using natural language.

Overview 

The Challenge

Engineers spent valuable time searching through Grafana dashboards to find the right metrics or logs, often needing to know the exact query syntax.

The Approach

Build an AI-powered natural language interface that understands what engineers need and retrieves the right insights, visualizations, or dashboards instantly.

The Solution

ObServant – a multi-agent system that routes requests to specialist agents for querying, dashboard searching, and generation.

The Impact 

Faster answers, less manual searching, and more time for engineers to focus on solving problems.

Research and Foundation 

The initial weeks were focused on exploring different AI frameworks and understanding the technical landscape:

Exploring AI Agent Frameworks

We started with creating our own MCP (Model Context Protocol) servers. We did a lot of tutorials, such as providing a weather tool for Claude after making an API request to the information from the weather website. We also experimented with Google's Agent Development Kit (ADK) and Gemini 2.0. Ultimately, we chose LangGraph for building multi-agent systems, paired with AWS Bedrock. 

Building the Multi-Agent Architecture

ObServant is structured as a cyclic, directed graph. Each node represents an agent, and the edges between the nodes represent the relationships between the agents. Rather than trying to solve every query with a single AI agent, we designed a system where a supervisor agent intelligently routes queries to specialized subagents

Supervisor Agent The supervisor acts as the orchestrator, analyzing incoming natural language queries and determining which specialized agent should handle the request. 

Specialized Subagents

  1. Data Acquisition Agent: Handles requests to return raw data from S3 buckets and Grafana dashboards
  2. Query Agent: Able to provide what metrics and logs are most relevant to certain queries, and answer queries about those specific metrics and logs
  3. Dashboard Search Agent: Finds existing dashboards that might answer user queries
  4. Dashboard Generation Agent: Creates new Grafana dashboards if existing ones do not exist

Major Technical Breakthroughs

Natural Language to Query Translation

One of the most challenging aspects was enabling users to query metrics and logs using natural language. Grafana’s open source MCP tools contain query_prometheus and query_loki_logs tools to query metrics and logs, but they required the queries to be in promQL and logQL respectively. To solve this problem, I built a translation system that:

  • Identifies whether a query relates to metrics or logs
  • Converts the natural language into proper PromQL (Prometheus) or LogQL (Loki) syntax
  • Uses web-scraped Grafana documentation to ensure accurate, up-to-date translations

Dashboard Management

Rather than always creating new dashboards, ObServant first searches existing ones to see if they can answer the user's question. Only when no suitable dashboard exists does it offer to create a new one with a human-in-the-loop approval/reject process. This process prevents us from overloading Grafana with extra unnecessary dashboards, and gives the user control over ObServant’s dashboard creation power. 

Vectorization

With over 90,000 Prometheus metrics available, finding relevant metrics for the agent was extremely difficult. Loading the metrics alone overwhelmed the agent and caused it to crash. To mitigate this issue, I implemented a vector storage solution to embed all available metrics for more efficient querying. This dramatically improved the accuracy and speed of finding relevant metrics. 

Deployment and Integration

Slack Integration

To make ObServant accessible to teams, I integrated it with Slack as an app, allowing users to mention @ObServant in channels or send direct messages to ObServant to ask questions. Users received formatted responses with visualizations and data, or links to relevant dashboards in Grafana if necessary. 

Technical Evolution Throughout the Summer

The project evolved significantly based on feedback from the DevX team.

July Updates

  • Shifted from building dashboards from scratch to generating importable JSON configurations
  • Added the dashboard search capability to avoid creating dashboards that already existed
  • Integrated PromQL and LogQL translation for direct metric querying
  • Implemented the human-in-the-loop approval process for dashboard creation

August Refinements

  • Added vector storage for intelligent metrics caching
  • Switched to downloadable JSON files for dashboard import (addressing credential limitations)
  • Enhanced the interrupt feature for better user control

Impact of ObServant

ObServant serves many user groups throughout Matillion. For example:

Customer Success Managers can now ask questions like:

  • "Can you review product adoption metrics?"
  • "What are the success/failure rates and dependencies of pipeline XYZ?"

Support Teams can request:

  • "Find a dashboard about XYZ metrics"
  • "Create a dashboard to help troubleshoot {this specific issue}"

This natural language interface eliminates the need for teams to navigate through dashboards manually or learn complex query languages, streamlining the process of data analytics to let engineers become more efficient in pinpointing and fixing issues. 

Reflection

Outside of the project scope, I learned many industry practices from proper Git strategies, agile framework with Jira, product management and planning, code review, and more. These skills will be invaluable to my future software engineering endeavors. 

My Time in Manchester

I have really enjoyed my time in Manchester over the course of the summer! I played pickup soccer/football with the locals, travelled on the weekends, and tried a bunch of new restaurants. A large highlight for me was traveling to Italy on the weekend to visit my older sister, or when the other US summer interns and I went to York and had afternoon tea! 

My Time at Matillion

My time at Matillion has not only been rewarding in gaining technical knowledge and industry experience, but also making connections with my coworkers. During our quarterly planning, my team went to a Crystal Maze themed escape room. It was really nice to get to know everyone outside of our daily standup meetings, and I learned so much about what hobbies and interests my team has outside of work. Outside of my team, I met a lot of new people. The culture at Matillion is warm and inviting, and it made coming to work so much fun! 

Looking Forward

My summer project demonstrated how modern AI can make complex technical systems more accessible, supporting Matillion’s mission to help teams deliver data productivity at speed and scale. I am excited to see the possibilities for me in harnessing the power of AI in the future, and I am so grateful for my experience at Matillion this summer!

Emma Wang
Emma Wang

Software Engineer Intern

Emma is a rising junior at MIT from the Bay Area studying Computer Science and Engineering, and Mathematics. On campus, she plays for the Women's Varsity Soccer Team, leads tours for MIT admissions, and conducts research at MIT's Computer Science and Artificial Intelligence Laboratory! She loves to cook, dance, and hike. 

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.