Integrate data from MySQL to Snowflake using Matillion

Our MySQL to Snowflake connector seamlessly transfers your data to Snowflake within minutes, ensuring it remains up-to-date without the need for manual coding or complex ETL processes.

MySQL
Snowflake
MySQL to Snowflake banner

Extracting data from MySQL to Snowflake

Extracting data from MySQL is an essential process for many organizations seeking to leverage the advanced analytics and scalability features offered by modern cloud-based data warehouses such as Snowflake. Successfully transferring data between these environments requires thoughtful planning and setup to ensure accuracy, reliability, and security throughout the pipeline. In this article, we will guide you through the key steps necessary to accomplish a successful MySQL-to-Snowflake data transfer. We will begin by walking through the creation of a dedicated identity in MySQL, which is fundamental for managing data access permissions. For organizations using Matillion as their ETL tool, we will explain how to check for—or acquire—the required JDBC driver to enable MySQL connectivity. Next, we will discuss considerations related to establishing secure and efficient network connectivity between your source (MySQL) and target (Snowflake) environments. Finally, we will explore strategies for querying your MySQL data, covering both the initial data load and methods for performing incremental updates to ensure your Snowflake environment remains current. By following these steps, you will be well prepared to build a robust integration tailored to your organization's analytical and operational requirements.


What is MySQL?

MySQL is a popular open-source relational database management system (RDBMS) using SQL for data definition and queries. Built on a client-server model, it supports transactional processing, ACID compliance (via InnoDB), and high concurrency for multi-user environments. Known for its reliability, scalability, and speed, MySQL powers many web and enterprise applications, especially within the LAMP stack (Linux, Apache, MySQL, PHP/Python/Perl). Its extensible design, wide community support, and compatibility with various languages and operating systems make it a crucial tool for modern, data-driven applications.

matillion logo x MySQL

What is Snowflake?

Snowflake is a cloud-native data warehousing platform designed to enable secure, scalable, and high-performance analytics across diverse data sets. Unlike traditional databases, Snowflake architecture separates compute, storage, and services layers, allowing independent scaling and efficient resource utilization. The platform employs a multi-cluster, shared-data approach, supporting concurrent workloads without contention and facilitating seamless data sharing among users or organizations. With support for structured and semi-structured data (such as JSON, Avro, and Parquet), Snowflake provides automatic performance tuning, robust security controls, and integration with major cloud providers—including AWS, Azure, and Google Cloud—making it a versatile solution for modern data engineering and analytics workflows.

Why Move Data from MySQL into Snowflake

Unlocking Powerful Analytics: The Benefits of Copying Data from MySQL to Snowflake

A data engineer or architect may wish to copy data from MySQL into Snowflake for several compelling reasons. Firstly, MySQL often contains data that is potentially valuable for analytics or business intelligence, but its native environment may not be optimized for large-scale querying or integration. By moving data from MySQL into Snowflake, professionals can integrate that data with information from other sources, such as cloud applications, files, or additional databases—unlocking deeper insights than could be derived from MySQL alone. In addition, conducting data transformations and analytical queries directly within Snowflake rather than on the source MySQL database prevents increased operational load on the MySQL system, ensuring continued application performance and reliability. This approach leverages Snowflake’s scalable compute and storage capabilities, enabling more efficient and comprehensive data analysis.

Creating a User in MySQL

This guide explains how to create a new user in a MySQL database and assign appropriate privileges, using SQL statements. The instructions assume you have administrative access to the MySQL server.

1. Access the MySQL Command Line

Log in to the MySQL server as the

root
user or another user with sufficient privileges:

mysql -u root -p

Enter your root password when prompted.

2. Create the User

To create a new user, use the

CREATE USER
statement. Substitute
username
,
host
, and
password
with appropriate values.

CREATE USER 'username'@'host' IDENTIFIED BY 'password';

  • 'username'
    : Replace with the desired username.
  • 'host'
    : Use
    'localhost'
    to limit access to the local machine, or use
    '%'
    to allow connections from any host.
  • 'password'
    : Specify a strong password.

Example:
Create a user named

appuser
who can connect from any host, with a secure password.

CREATE USER 'appuser'@'%' IDENTIFIED BY 'SecureP@ssw0rd!';

3. Grant Privileges

After creating the user, assign the necessary privileges. For example, to grant all privileges on a specific database:

GRANT ALL PRIVILEGES ON database_name.* TO 'username'@'host';

Example:

GRANT ALL PRIVILEGES ON mydatabase.* TO 'appuser'@'%';

To grant only specific privileges (e.g., SELECT, INSERT):

GRANT SELECT, INSERT ON mydatabase.* TO 'appuser'@'%';

4. Apply Changes

Flush the privileges to ensure that all changes take effect immediately:

FLUSH PRIVILEGES;

5. (Optional) Verify the User

To confirm that the user has been created:

SELECT user, host FROM mysql.user WHERE user = 'appuser';


Note:
Avoid using weak passwords, and only grant the minimum privileges necessary for each user as a best security practice.

Installing the JDBC Driver

The MySQL JDBC driver, which enables connectivity between Matillion Data Productivity Cloud and MySQL databases, is not included by default in Matillion deployments. This exclusion is due to licensing or redistribution restrictions associated with the driver. To use MySQL as a data source or target, you must manually download and install the JDBC driver as an external dependency within your Matillion environment. The following instructions apply at the time of writing.

Step 1: Download the MySQL JDBC Driver

  1. Navigate to the official MySQL Connector/J download page: https://dev.mysql.com/downloads/connector/j/
  2. On the download page, locate and select the latest stable release of the connector.
  3. Prefer the Type 4 JDBC driver, which is a pure Java implementation and does not require native libraries.
  4. Download the relevant “Platform Independent” version: this will typically be provided as a zip or tar.gz archive. Extract the file to locate the JAR file for the JDBC driver (for example,
    mysql-connector-java-8.x.x.jar
    ).

Step 2: Install the JDBC Driver into Matillion Data Productivity Cloud

Matillion provides a process for uploading external JDBC drivers to your environment. Follow the official Matillion installation instructions here: https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/.

In brief, you will generally be required to:

  • Log in to your Matillion Data Productivity Cloud environment.
  • Access the section for agent configuration or driver management.
  • Upload the MySQL JDBC JAR file to the designated location, following on-screen prompts or documentation guidance.
  • Restart or refresh your Matillion Agent if instructed, to ensure the new driver is picked up and available for use.

Step 3: Configure Your Database Query Components

Once the driver is installed, you can begin designing and running queries against your MySQL databases within Matillion. For instructions on how to utilize the newly installed JDBC driver inside your pipelines (including information such as connection URLs, authentication, and supported features), refer to the official Matillion usage guide here: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/.

Checking network connectivity

Before connecting Matillion Data Productivity Cloud to your MySQL database, you must ensure that the database allows incoming network connections according to your deployment configuration:

  • Full SaaS agent configuration: You need to allow incoming connections to your MySQL database from the list of IP addresses provided by Matillion. These IP addresses can be found at Matillion documentation: Allowing IP addresses. Make sure these addresses are permitted in your database’s network access controls (such as firewall or security group settings).

  • Hybrid SaaS deployment: In this scenario, you should allow incoming connections from your own Virtual Private Cloud (VPC). The Matillion agent runs within your VPC, so ensure the necessary security group and firewall rules are configured to allow network traffic from your internal resources to the MySQL database. To help verify network access from your VPC, refer to the tools available here: Matillion Exchange: Check network access.

Additionally, if your MySQL database is referenced using a Domain Name System (DNS) address rather than a direct IP address, you must ensure that the Full SaaS or Hybrid SaaS agent can resolve the database’s DNS name to its corresponding IP address. Proper DNS resolution is critical for connectivity; verify that external or internal DNS configuration allows this, depending on your network setup.

Querying Data from a MySQL Database

This guide provides instructions and best practices for querying data from MySQL, especially when integrating or migrating data to another platform such as Snowflake. Examples include SQL

SELECT
statements, handling datatype conversions, and performing initial plus incremental loads using the Database Query component.

Example MySQL SELECT Statements

Here are basic examples of retrieving data from a MySQL database:

``` -- Select all columns from a table SELECT * FROM employees;

-- Select specific columns SELECT first_name, last_name, hire_date FROM employees;

-- Apply a filter for incremental loads (see section below) SELECT employee_id, first_name, hire_date FROM employees WHERE last_updated > '2024-06-01'; ```

Datatype Conversion to Snowflake

When migrating data between MySQL and Snowflake, be aware that datatype conversions may occur. For instance:

  • MySQL
    DATETIME
    can be mapped to Snowflake
    TIMESTAMP_NTZ
  • MySQL
    VARCHAR
    maps to Snowflake
    VARCHAR
  • Numeric types may need precision adjustments

It's important to review the datatype mapping documentation for Snowflake to ensure fidelity of your data during migration.

Initial and Incremental Load Patterns

The recommended pattern for loading data is:

  1. Initial Load: Copy the full dataset once.
  2. Incremental Load: Subsequently load only new or modified records, often determined by a timestamp or incrementing key.

Both steps use the same Database Query component, only differing in whether a filter (such as a

WHERE
clause) is present.

Initial Load (No Filter Clause)

In an initial load, the SQL statement typically has no filter clause, thus selecting all data:

SELECT * FROM employees;

Incremental Load (With Filter Clause)

For incremental loads, include a filter to select only records that are new or changed since the last load. For example:

SELECT * FROM employees
WHERE last_updated > '2024-06-01 00:00:00';

Tip: The filter condition is often parameterized to the value of the last successful load.

Read more: Incremental Load and Data Replication Strategy


By structuring your loads this way, you ensure efficient, performant, and consistent data movement between MySQL and your target system.

Data Integration Architecture

Loading data into Snowflake in advance of integration is a classic way to "divide and conquer" the data integration challenge, as it separates the process into two manageable steps: first, data is ingested or loaded into the database (Extract and Load), and then transformed as needed (Transform). This separation is a key advantage of the ELT (Extract, Load, Transform) architecture, as opposed to the traditional ETL approach. Data integration in Snowflake nearly always requires some level of data transformation—such as restructuring, cleansing, or joining data from multiple sources—and the most effective way to implement these transformations is through data transformation pipelines. Another significant advantage of the ELT approach is that all transformation and integration operations occur within the Snowflake platform itself. This means the process benefits from Snowflake’s native scalability, performance, and on-demand data processing capabilities. Furthermore, since all computing resources are provisioned dynamically within the database environment, there is no need to invest in or maintain separate data processing infrastructure, resulting in both operational simplicity and cost savings.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.