- Blog
- 01.30.2025
- Product, Leveraging AI
The Power of Generative AI

Enhancing MatiHelper with OpenAI Integration
Welcome to Part 4 of this blog series, demonstrating the art of the possible, using Matillion products and features to build the MatiHelper Slack AI app. In this series, we’re taking a deep dive into how the MatiHelper Slack AI app was built, starting with design and now focusing on pipeline development. If you haven’t already seen MatiHelper in action, here’s that video to help set the stage!
In the first part of this series, I discussed how to approach designing data pipelines by highlighting design concepts that can be used to build simple and scalable pipelines. Part two focused on the topic of the data lifecycle and how it defines the data journey. I also showed what the data lifecycle looks like for the MatiHelper Slack AI app. And, designed a tracking table and ancillary views showing the MatiHelper data at different stages throughout its lifecycle. In part three, I showed how to use Matillion Flex Connectors to integrate with Slack to fetch messages from a channel. In this article, I’ll focus on the brains of the MatiHelper Slack AI app, which is an integration to GenAI, powered by OpenAI.

Generative Artificial Intelligence (GenAI)
GenAI, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). These are all acronyms related to artificial intelligence that were foreign to most people not too long ago. But, over the last few years, it’s hard to not come across one of these terms in your day-to-day life. For example, look at the theme of Super Bowl ads throughout the years. The Super Bowl has been long held as the pinnacle of advertising. In 2022, there were countless Super Bowl commercials with celebrities promoting cryptocurrency firms. Cryptocurrency was and still is one of the most disruptive technologies to have an impact on society as a whole. But, even with all that fanfare, many people struggle to find a practical use for cryptocurrency in their daily lives. In 2024, by and large, the prominent theme of Super Bowl ads centered around Generative AI (GenAI) and the application of GenAI to improve many facets of daily life, both personal and business. GenAI has quickly become today’s biggest technology disruptor with an infinite number of real-world applications.
Along this wave of GenAI comes the term LLM, or Large Language Model. LLMs are artificial intelligence models that have been trained to generate text by processing and understanding provided text input. AI vendors such as OpenAI, Cohere and Anthropic have sprouted up and led the charge in the world of making LLMs accessible to all. Technology giants such as Microsoft, Google, Databricks, and Snowflake have also invested heavily in the development of their AI products and LLMs.
As companies evaluate the application of GenAI to help drive innovation and efficiencies, there are many areas where these benefits are being recognized. Common use cases that GenAI can help with are:
- information extraction
- text summarization
- text classification
- attribute scoring
- sentiment analysis
The other piece of the GenAI stack that makes this technology important for companies is RAG, or Retrieval Augmented Generation. RAG is the concept of adding external knowledge to an LLM, giving the LLM the ability to provide a business context to the LLM. As an example, here at Matillion, we use GenAI to assist our Support Engineers with customer-raised support requests. When a new support request is created by a customer, GenAI is used in a few different ways to ultimately help resolve the request faster and with more accuracy. In that workflow, first, an LLM is used to summarize the support request, extracting keywords that represent the problem. Then, RAG is used to identify Matillion Knowledge Base articles or Documentation that is relevant to the keywords extracted from the support request. Finally, an LLM is used to collate the identified resources and construct a response to the support request. From there, a Support Engineer will review the GenAI-generated response and determine if it answers the support request and if it should be used. This “human in the middle” approach to using AI empowers our Support Engineers to help automate some tasks and enhance their productivity. But, the human element is important here. In this way, we use AI to augment human capabilities as opposed to replacing humans altogether.
What I’ve described above on how Matillion is using GenAI to help enhance our Customer Support experience is fully powered by Data Productivity Cloud. To learn a little more about that process and the benefits we quickly saw from it, have a look at this blog article.
Democratizing GenAI
Data Productivity Cloud is designed to allow both coders and non-coders to build and manage data pipelines. This key tenant carries over to GenAI. Data Productivity Cloud has components that allow users to easily integrate GenAI into their data pipelines. The AI components and features available in Data Productivity Cloud fall under a few high level categories:
- AI source components provide the ability to handle not only text but images and sounds.
- AI prompt components simplify the integration with different LLMs provided by different AI vendors, making it easy to evaluate and use the LLMs of your choice.
- In support of RAG, there are Vector store-focused components, which center around managing and querying vector indices in a vector store.
- All of the AI Prompt components also provide easy integration with vector stores, which simplifies the typical LLM/RAG pattern seen in GenAI.
MatiHelper is a really simple example of integrating GenAI into a data pipeline. Here, messages captured from a Slack channel are submitted to OpenAI and processed by a supported OpenAI LLM to return a response.
In all honesty, this part of MatiHelper is the simplest of all the pipelines that comprise MatiHelper. The simplicity shows the power of GenAI, where the only direction I’ve provided to the LLM is to “Answer the provided question in 3-5 sentences.”. But, as described in our Support Case example, there are some very powerful capabilities that GenAI now unlocks that were perceived as too challenging or impossible not so long ago.
Art of the AI Possible
When building MatiHelper, I wanted to keep the initial AI integration simple, just so I could see the end-to-end AI app working. And, as shown in the video, that simple integration with OpenAI provides some powerful results. I have started to think about a MatiHelper 2.0, which would center around further extending the GenAI capabilities of MatiHelper. While these ideas are still incubating in my head, I’ll share some of those ideas and the high-level approach of how I might accommodate each.
Swapping out LLMs
I opted to use OpenAI and the gpt-4o LLM to power the MatiHelper Slack AI app. However, any of the other AI Prompt components could be easily swapped in. So, if I were interested in using an LLM available in Amazon Bedrock or Azure OpenAI, I could use those specific components. The setup and configuration of all AI Prompt components are very similar, which makes evaluating different LLM vendors very simple.
Giving MatiHelper a Personality
When configuring an AI Prompt component such as the OpenAI Prompt component, the “User Context” defines for the LLM some context in which to respond. The MatiHelper User Context is quite simple, simply directing it to “Answer the provided question in 3-5 sentences.”. In the Support Case example mentioned previously, the “User Context” provided to the LLM is to provide an answer from the perspective of a Support Engineer. This “User Context” is really powerful, giving one the ability to essentially provide a personality and boundaries around how to formulate a GenAI-based response. If I wanted MatiHelper to take on a more entertaining personality, I could change the “User Context” to something like “Provide a light-hearted humorous response to the provided question.”. The possibilities are endless and it’s simple to change and test the result of different “User Context” configurations.
When using the Open AI prompt component, I have configured it to return a response as text. However, I could also change the Output Format from the simple TEXT output to JSON. By doing so, this allows one to define additional output fields from the LLM. This could be useful for adding additional output fields that could influence how MatiHelper responds. For example, I could define a “Category” field and direct the LLM to evaluate the Slack message to associate to a pre-defined Category. This could be used to further target the response (in support of RAG). Or, another output field could be defined to evaluate the sentiment of the Slack message. The sentiment could be used as a means of how the LLM would respond back, perhaps providing more empathy in a response when the original message was deemed to have a negative sentiment.
Adding Business Context
Introducing RAG to MatiHelper is also another feature that would make it much more powerful, as it would give MatiHelper additional business context to work with. As an example, I could combine the MatiHelper Slack AI app pipelines with the Support Case RAG pipelines, which would result in MatiHelper acting like a Matillion Support Engineer. It could provide a live AI-backed chat interface for users, tapping into additional internal knowledge resources to help provide targeted and relevant responses.
Multimodal data
Multimodal AI is a type of artificial intelligence that can process and integrate information from different modalities, including images, videos, audio and text. A focus of Matillion in expanding our AI capabilities centers around this area. Recently, Matillion added components that integrate with cloud services such as Amazon Textract, Amazon Transcribe, Azure Document Intelligence and Azure Speech Transcribe. These are AI and ML (machine learning) services that focus on extracting data from images and audio. Also, the gpt-4o model that MatiHelper uses today also accepts things like images as inputs. Giving MatiHelper the ability to work with other types of data like images or audio files could be integrated using one of these features. Imagine if MatiHelper could take an image posted in a Slack message, determine what’s in the image, and provide a response based on what it sees in the image. This could be done just by adding one of these additional features to the MatiHelper pipelines.
While I haven’t yet added this feature to MatiHelper, I have been thinking about it and will share here some things to consider when planning the support for multimodal inputs.
- When working with the Amazon and Azure services for extracting information from documents, images or audio, they typically require that the file to evaluate resides in a cloud storage location. The Data Transfer component could be instrumental in getting these files into a cloud storage location that the service can then read from.
- MatiHelper integrates with Slack for the initial user input. In the Slack API response that includes the details of a Slack message, any attachments (i.e. images, files, etc) are provided as an array of links that point to the file(s). To retrieve these files, authentication is required. Specifically, the Slack Bot token being used must have the files:read scope assigned. See Slack’s documentation for working with file types for more information.
- The OpenAI Prompt component does have a property that allows one to define an Input as an Image (when using GPT-4o). When using this feature, the image provided should be in either Base64 format or a direct, public URL of the image (that OpenAI can access).
Conclusion
So, here concludes Part 4 of this blog series, where I focused on the integration with GenAI to give MatiHelper a brain. This initial implementation of MatiHelper shows how simple it is to add artificial intelligence into a data pipeline. Things that were seen as impossible or too difficult to develop in the recent past are now very feasible because of GenAI. Data Productivity Cloud significantly lowers the barrier to entry to AI, democratizing AI for all!
If you want to learn more about other ways to leverage GenAI in data pipelines, see some of these other blog articles, that go into much greater depth!
- Barista demo - using AI to process unstructured data
- Sentiment Analysis with Anthropic Claude 3 Sonnet using Amazon Bedrock
- Make your LLM an expert on any subject using Retrieval Augmented Generation. Practical Walkthrough
Or find all of Matillion’s AI-focused blogs here!
And, stay tuned for the next blog series in this article! Here’s a quick peek into the upcoming parts in this blog series!
- Part 5: Webhooks and Pushdown Python
- Part 6: Microbatching for Continuously Running Pipelines
Downloads
You can find the MatiHelper Slack AI App pipelines available for download on the Matillion Exchange here!
Arawan Gajajiva
Principal Architect - Sales Engineering COE
Featured Resources
Agents of Data: Preparing Organizations for Agentic AI
Agentic AI has gone from curiosity to core strategy in what feels like a matter of months. But while the technology is racing ...
BlogAgents of Data: Digging into Semantic Layers
Semantic layers have quietly powered business intelligence tools for years. Now, as agentic AI systems emerge, they're ...
BlogHuman in the Loop in Data Engineering
Data pipelines are the backbone of modern analytics, but they're also notoriously fragile. The most resilient pipelines ...
Share: