- Blog
- 01.14.2025
- Leveraging AI, Product
The Data Lifecycle in the MatiHelper Slack AI App

Welcome back to my blog series, focusing on Matillion features and design patterns that can be applied to many different data-oriented use cases and scenarios. In this series, we’re taking a deep dive into how the MatiHelper Slack AI app was built, as an example of an end to end data pipeline. If you haven’t already seen MatiHelper in action, here’s that video to help set the stage!
In Part 1 of this series, I kept things high level, reviewing core data pipeline design concepts that are key to building pipelines with simplicity, consistency, and scalability at the forefront. In the following parts of this series, I will now look closer at the pipelines that support the MatiHelper Slack AI app, which helps show some of these pipeline design concepts in action. In this article, we will focus on the data lifecycle of the MatiHelper Slack AI app and how it’s managed in these pipelines.
Data Lifecycle
When planning the MatiHelper Slack AI app, the first step was to think about the integration points and how the data lifecycle for this app might look. In this case, the data journey starts in Slack, with a user creating a message. Each message is then submitted to OpenAI as a prompt, which returns a response. That response is then posted back as a threaded reply to the original Slack message, which completes the lifecycle for that piece of data. Visually, the data lifecycle looks like this:

Seeing the progression of the data in this way helps to break down segments of the journey into manageable pieces. Each segment will translate into a data pipeline that manages that segment of the data journey.
Admittedly, the data lifecycle for the MatiHelper Slack AI app is not typical of common cloud data warehouse workloads. More traditionally, cloud data warehouse data pipelines involve ingesting large volumes of data from various sources and transforming that data to help drive business insights. The MatiHelper Slack AI pipelines would be better categorized as a data app or an AI app. While the data journey may look different from those more common data warehousing use cases, the design concepts and ideas applied here are still relevant. We are also seeing here the infinite possibilities that interjecting generative AI into data pipelines have begun to unlock! Sentiment analysis, data summarization, and data classification are just some of the things you can now easily derive from your data by leveraging generative AI!
Tracking Table
As discussed in Part 1 of this series, metadata is king. Using metadata in your data pipelines help to design modular pipelines that are reusable and scalable. In the case of the MatiHelper Slack AI bot, all of its data pipelines are centralized around a singular tracking table, which has been named slack_ai_queries. Each row in this tracking table represents an individual Slack message. The columns in the table reflect the data being captured around each Slack message and metadata, primarily timestamp fields that define where in the lifecycle each Slack message has progressed to.
Admittedly, the design of this tracking table evolved as the related data pipelines were developed. In fact, this is to be expected, as you will likely discover some important metadata that needs to be captured as you start to develop the individual pipelines. Some fields in the table are self explanatory. But as we will see in the future blog articles that focus on the Slack integrations, the Slack message unix timestamp, msg_ts, is used in a couple different ways, such as determining the most recent Slack message captured, in support of incremental loading.
Views
Because the MatiHelper Slack AI app will be running continuously, it may capture new Slack messages at any time. And, previously captured Slack messages may also still be somewhere in the middle of their journey. For this reason, it is important to leverage the timestamp metadata, so as to easily identify messages at specific points of the data lifecycle. Every captured Slack message will initially have a timestamp that represents when it was created, the start of the data journey. The other timestamps in the table will initially have no value (null) and become populated as that message progresses through its journey.
With the above tracking table structure in mind, I can represent Slack messages at key points of the journey with the following SQL logic:
Slack messages that need to be submitted to OpenAI
SELECT "msg_id", "msg_text", "msg_ts", "msg_create_ts", "msg_receipt_ts", "msg_prompt_ts", "msg_answer", "msg_reply_ts" FROM "slack_ai_queries" WHERE "msg_prompt_ts" IS NULL
Slack messages with an OpenAI answer that need to be posted as a Slack reply
SELECT
"msg_id",
"msg_text",
"msg_ts",
"msg_create_ts",
"msg_receipt_ts",
"msg_prompt_ts",
"msg_answer",
"msg_reply_ts"
FROM "slack_ai_queries"
WHERE "msg_answer" IS NOT NULL
AND "msg_reply_ts" IS NULL
AND "msg_prompt_ts" IS NOT NULL
And, I can also calculate when the most recent captured Slack message was created with this SQL:
SELECT
MAX("msg_ts") AS "max_msg_ts"
FROM "slack_ai_queries"
When I start to build the actual data pipelines, the above SQL represents much of the logic that needs to be accounted for during different steps of the journey. To simplify the design of those data pipelines, I’ve elected to build Views on top of my tracking table. The above SQL represents the logic that would define those views. Using views in this way provides an easy and consistent way to look at the data in the tracking table. As data changes in the tracking table, the views will reflect that accordingly.
Data Productivity Cloud Pipelines
OK, so now that we have a handle on the data lifecycle and supporting metadata, we can start to build some pipelines! I like to start my data pipelines with an “initialization” pipeline, which creates any table or metadata dependencies if not already present. This helps to self-document how to build those dependencies and makes it easy to accommodate any adjustments that arise during development.
The main orchestration pipeline for this step is named Slack AI - DDL. Opening up that pipeline, you can see it simply contains a Create Table component, to create the tracking table and a nested transformation pipeline, which creates the views from the tracking table.
The transformation pipeline that builds the views does exactly what the SQL code I shared earlier represents. The visual nature of Data Productivity Cloud data pipelines helps to make that logic very understandable, irrespective of familiarity with SQL. Instead of writing SQL to represent logic, in Data Productivity Cloud, we are using Filter components and Aggregate components to represent the core logic. And, we are using the Create View component to create the actual views. This really opens the doors to a lot of different personas to create their own data pipelines. As a person who speaks fluent SQL, it’s an easy transition to using the graphical component equivalents. Having a visual way to define data logic also really helps when explaining that logic to others.
Conclusion
So, here concludes Part 2 of this blog series, where we focused on the data lifecycle of the MatiHelper Slack AI app and the associated table and views that represent that lifecycle. Now that we’ve laid out the framework, in the next few blog articles we’ll start to walk through the pipelines that do all of the behind-the-scenes work!
Here’s a quick peek into the upcoming parts in this blog series!
- Part 3: Integrating with Slack
- Part 4: The Power of GenAI
- Part 5: Webhooks and Pushdown Python
- Part 6: Microbatching for Continuously Running Pipelines
Downloads
You can find the MatiHelper Slack AI App pipelines available for download on the Matillion Exchange here!
Arawan Gajajiva
Principal Architect - Sales Engineering COE
Featured Resources
Big Data London 2025: Key Takeaways and Maia Highlights
There’s no doubt about it – Maia dominated at Big Data London. Over the two-day event, word spread quickly about Maia’s ...
BlogSay Hello to Ask Matillion, Your New AI Assistant for Product Answers
We’re excited to introduce a powerful new addition to the Matillion experience: Ask Matillion.
BlogRethinking Data Pipeline Pricing
Discover how value-based data pipeline pricing improves ROI, controls costs, and scales data processing without billing surprises.
Share: