- Blog
- 06.11.2025
- Leveraging AI, Data Fundamentals, Product
Securing Generative AI: Emerging threats and mitigation strategies

Alongside the explosion of generative artificial intelligence this past year, with it being adopted and leveraged to advance many different features in the tech industry, its rise to glory has to come with due scrutiny and inspection of the risks attached to it.
A recent report from securityboulevard.com (Vulnerabilities for AI and ML Applications are Skyrocketing) has noted an uptick in the vulnerabilities associated with AI and ML applications, which hints towards introducing new vulnerabilities specifically tied to AI technology. However, many of the vulnerabilities shown in the world’s first AI/ML bug bounty program are related to projects and solutions associated with AI/ML technologies.
The vulnerabilities are the same vulnerabilities we see in regular applications and systems, which we find through bug bounty programs, penetration testing, and security tooling. Although this data is still useful, it is to be expected. As new solutions are created, vulnerabilities in those companies will be found.
With this said, many still wonder how best they can secure their products using AI and implement it to keep their customers' data secure and what specific threats the AI technology introduces that are unique to the technology of Generative AI.
For the remainder of this blog post, I’ll focus on Large Language Models (LLMs) - the main category in generative AI and the one Matillion uses in its Data Productivity Cloud.
What are LLMs?
Using a Large Language Model, I asked it to define LLMs:
LLMs, or Large Language Models, are advanced artificial intelligence systems designed to understand and generate human language by leveraging large datasets and intricate neural networks.
GPT-4o 15/05/2024
In our case, the LLM takes in natural language text and responds in kind.
When implementing LLMs in an application, it is important to understand the specific risks associated with sending data to an internet API—which should be covered in your usual threat modeling activities—and the unique threats that come with utilizing an LLM.
The following are a few significant areas of security that are unique to LLMs:
- Prompt Engineering:
- Escapes and information disclosure
- Configuring Payloads from the LLM
- Resource exhaustion
- Training Data Exposure
Injection Attacks on Large Language Models Through API Inputs
One key concept to consider when it comes to leveraging LLMs through the APIs is the idea of the customer being able to control what goes into the LLM with arguably an uncontrollable effect on the output.
This is a similar risk to SQL Injection (SQLi) attacks, whereby an attacker can escape part of the SQL statement and write their own malicious SQL. The difference is that SQLi can be effectively protected against!
Once you allow user input to feed into LLMs, an attacker can control part of the prompt sent to the model, allowing many creative ways to escape input and output controls. This is demonstrated well on Reddit whereby folks are breaking out of ChatGPT chatbot: https://www.reddit.com/r/ChatGPTJailbreak/.
This raises two unique threats that need to be protected against:
- Attacker configuring malicious payloads to be reflected to compromise the receiving host
- Attacker reflects sensitive data contained in the prompt that the users shouldn’t be able to see
Resource Exhaustion Through Increased LLM Usage
The second category of threat that is unique but similar to the rate-limiting threats is “resource exhaustion.” If the attacker can force the prompt and response to be large and initiate many requests to the LLM, the attacker could ramp up the costs of using the LLM, burning the budget put towards the application.
Exposure of Sensitive Training Data
The third and final threat we’ll cover in this blog is exposing training data. If you have trained your AI on sensitive data that no one should be able to see, an attacker could potentially expose the training datasets through clever enumeration techniques.
This is why it is important to use minimal sensitive data and add noise into the training data (amongst other techniques) to protect against this threat.
What is Matillion doing with Generative AI?
Well, lots, actually! We now offer data engineering with AI, whereby we offer prompt engineering, bring your own AI, Retrieval Augmented Generation (RAG), and Copilot. All of these can amplify your data pipelines with generative AI.
The Matillion Data Productivity Cloud offers many AI LLMs that enable you to perform interesting queries on your data, including:
- OpenAI ChatGPT
- AWS Bedrock
- Azure Open AI
- Snowflake Cortex
Matillion also offers a rich Copilot experience, which enables you to build pipelines in natural language in conjunction with our intuitive no-code, low-code drag-and-drop system, making your experience even more efficient than it already is.
Finally, Matillion offers the ability to use Retrieval-Augmented Generation pipelines in the Data Productivity Cloud, enabling you to use your own in-house data to get the best answers out of your LLM integrations.
Conclusion
As generative AI evolves and expands across many facets of our applications and engineering, the requirement for robust security measures increases.
By learning the specific threats associated with generative AI technologies, secure programming practices based on findings from activities such as Threat Modelling can ensure a safe and reliable product for customers to use.
In Matillion’s case, this enables organizations to harness the power of generative AI to build vector database pipelines, build pipelines using natural language using Copilot, and integrate with their desired LLM providers.
Follow this link to try out the Matillion platform for free on your own data.
Aaron Cameron
Senior Application Security Engineer
Featured Resources
Big Data London 2025: Key Takeaways and Maia Highlights
There’s no doubt about it – Maia dominated at Big Data London. Over the two-day event, word spread quickly about Maia’s ...
BlogSay Hello to Ask Matillion, Your New AI Assistant for Product Answers
We’re excited to introduce a powerful new addition to the Matillion experience: Ask Matillion.
BlogRethinking Data Pipeline Pricing
Discover how value-based data pipeline pricing improves ROI, controls costs, and scales data processing without billing surprises.
Share: