- Blog
- 09.25.2024
- Leveraging AI, Product, Data Fundamentals
Leveraging AI in Matillion’s Data Productivity Cloud for Data Categorization and Visualization

Accurate data classification is essential for highlighting trends and making well-informed decisions. This blog explores how AI integrations—particularly large language models (LLMs)—can be utilized within Matillion’s Data Productivity Cloud (DPC) to perform data categorization, standardize labels, and cleanse data. The standardized data can then be seamlessly integrated into Business Intelligence (BI) tools for enhanced visualization and analysis.
In the example below, users’ job titles were classified into categories such as job seniority and department, using Matillion’s OpenAI Prompt component. DPC's AI components provide fast and easy access to powerful tools, enabling some of the most effective data teams to make informed decisions through structured and consistent data.
Utilizing the LLM Prompt Component in Matillion DPC
Component: OpenAI Prompt
The OpenAI Prompt component in Matillion allows users to generate responses to text-based prompts using OpenAI's language models. It takes input from your source data, combines it with a user-defined prompt, and sends it to the LLM for processing. This component can be instrumental for various classification tasks, such as determining job seniority or department from job titles.
Matillion also offers alternative components, including Azure OpenAI Prompt and Amazon Bedrock Prompt, to provide flexibility in leveraging various AI models.
In this project, I used several OpenAI Prompt components to classify the job seniority and department of users in two distinct groups:
- Inviter's job seniority and department (user sending an invite)
- Invitee's job seniority and department (user receiving an invite)
By using the user’s job title as input, I was able to automate the process of categorizing users according to predefined criteria.
Testing for Optimal Temperature
When working with AI-generated results, it’s crucial to fine-tune the LLM's temperature setting to achieve the desired level of creativity or consistency. Since the goal here was to categorize job titles into well-defined seniority levels and departments, less creativity and more consistency were required.
After experimenting with different temperature settings, I found that a value of 0.2 produced the most accurate and consistent classifications. Lower temperatures like this ensure more reliable and repetitive outputs, which is ideal for tasks like categorization, where consistency is key.
To quote OpenAI:
"Lower values for temperature result in more consistent outputs, while higher values generate more diverse and creative results. Select a temperature value based on the desired trade-off between coherence and creativity for your specific application."
Few-Shot Prompt Engineering
To improve the LLM’s ability to classify job titles accurately, I employed a Few-Shot Prompt Engineering approach. This technique involves providing the LLM with a few examples to help it understand the task at hand. You will find more examples in this article on Zero Shot vs. Few Shot Prompting.
For this project, I created a prompt with examples of job titles and their corresponding seniority levels and departments.
For instance, a user with the job title 'Senior Director of Business Intelligence' would be classified under 'Senior' job seniority, while a job title like 'Enterprise Sales Engineer' would be categorized in the 'Sales/Pre-Sales' department.
This approach enabled the LLM to map diverse job titles to a standardized list of seniority levels and departments with great accuracy, resulting in a more structured dataset.
Standardized Labels for Data Visualization
Once the job titles were classified using the OpenAI Prompt component, they were mapped to a standardized set of labels. This new dataset with standardized labels was output into a new table, which significantly simplified the data.
The standardization of job titles ensures that stakeholders can analyze data more effectively, avoiding inconsistencies caused by various naming conventions. Instead of dealing with an unstructured and endless list of job titles, data is now categorized into uniform labels. This consistency is critical when integrating the data into BI tools like ThoughtSpot or Tableau.
Seamless Integration with BI Tools
Integrating the cleansed and standardized data into BI tools was seamless, particularly with the data stored in a data warehouse like Snowflake. The connection between Matillion DPC and Snowflake enabled smooth data flow into BI tools, where I created visualizations such as charts and graphs that provided valuable insights.
These visualizations helped uncover patterns in job positions, departments, and relationships among users. With a clean and structured dataset, decision-makers can easily spot trends, compare roles, and make strategic decisions based on reliable information.
Conclusion
This project underscores the transformative potential of AI in automating and improving data workflows. By leveraging Matillion’s Data Productivity Cloud and AI components, I was able to build a data pipeline that efficiently classifies, standardizes, and visualizes job title data. These advancements open new doors for businesses looking to optimize their data classification efforts, ensuring they can harness the full power of structured data in making insightful decisions.
AI is more than just a tool for automation—it’s a catalyst for unlocking deeper understanding and driving better business outcomes.
Unlock the full potential of AI in your data workflows with Matillion’s powerful Data Productivity Cloud. Try it free today and experience unprecedented productivity, collaboration, and speed in building and managing data pipelines.
Isabelle Ng
Associate Data Engineer
Featured Resources
What Is Massively Parallel Processing (MPP)? How It Powers Modern Cloud Data Platforms
Massively Parallel Processing (often referred to as simply MPP) is the architectural backbone that powers modern cloud data ...
BlogETL and SQL: How They Work Together in Modern Data Integration
Explore how SQL and ETL power modern data workflows, when to use SQL scripts vs ETL tools, and how Matillion blends automation ...
WhitepapersUnlocking Data Productivity: A DataOps Guide for High-performance Data Teams
Download the DataOps White Paper today and start building data pipelines that are scalable, reliable, and built for success.
Share: