Integrate data from Microsoft SQL Server to Snowflake using Matillion

Our Microsoft SQL Server to Snowflake connector seamlessly migrates your data to Snowflake within minutes, ensuring it remains current without requiring manual coding or intricate ETL scripting.

Microsoft SQL Server
Snowflake
Microsoft SQL Server to Snowflake banner

Extracting data from Microsoft SQL Server to Snowflake

Extracting data from Microsoft SQL Server is a common requirement for organizations seeking to leverage cloud-based analytics platforms such as Snowflake. Efficiently migrating both initial and ongoing (incremental) data is critical to ensuring timely and reliable availability of enterprise data where it is needed most. This article will guide you through the key steps involved in this process. First, we will explain how to create and configure a suitable identity in Microsoft SQL Server, ensuring secure access to the required data. For users employing Matillion ETL, we will outline how to check for—or acquire—the appropriate JDBC driver needed to facilitate the database connection. Next, we’ll review best practices for ensuring reliable network connectivity between your SQL Server source and your Snowflake target environment. Finally, we will distinguish between methods used for initial full data extraction and ongoing incremental data loading, providing a clear framework for efficient data migration. By following this guide, you will be well equipped to establish a robust, repeatable process for transferring data from Microsoft SQL Server to Snowflake.


What is Microsoft SQL Server?

Microsoft SQL Server is a relational database management system (RDBMS) by Microsoft, supporting on-premises and cloud applications. It manages structured data using T-SQL, and features in-memory processing, high availability, and robust security (encryption, auditing). SQL Server integrates with development tools, reporting services, and analytics, enabling enterprise-level data warehousing, business intelligence, and transactional workloads. Its flexible deployment options—including containers and managed services on Azure—allow scalable, secure database solutions for diverse needs.

matillion logo x Microsoft SQL Server

What is Snowflake?

Snowflake is a cloud-based data warehouse designed for high performance, scalability, and ease of use in data storage, processing, and analytics. Its architecture decouples compute and storage, enabling independent scaling and supporting multiple concurrent workloads. Leveraging multi-cluster, cloud-native architecture, Snowflake offers automatic scaling, on-demand provisioning, and near-instant elasticity. It natively supports structured and semi-structured data (e.g., JSON, Avro, Parquet), with SQL compatibility and advanced features like secure data sharing, time travel, and zero-copy cloning. As a fully managed service, Snowflake reduces administrative overhead, letting organizations focus on insights instead of infrastructure management.

Why Move Data from Microsoft SQL Server into Snowflake

The Benefits of Transferring Data from Microsoft SQL Server to Snowflake for Advanced Analytics

A data engineer or architect may wish to copy data from Microsoft SQL Server into Snowflake for several reasons. Firstly, Microsoft SQL Server often contains business-critical and potentially valuable data that, when used effectively, can drive strategic insights and decision-making. However, the true value of this data is often realized only when it is integrated with information from other systems, such as customer relationship management platforms, financial databases, or external data sources. By bringing these diverse datasets together in a unified environment, organizations can perform comprehensive analyses that are not possible when data remains siloed. Leveraging Snowflake as the integration platform offers distinct advantages. It enables powerful, scalable analytics without imposing additional processing overhead on the SQL Server environment. This separation ensures that operational workloads on SQL Server remain unaffected, while Snowflake efficiently handles complex data transformations, integrations, and analyses.

Creating an Identity in Microsoft SQL Server: Adding a User to a Database

To grant individuals or applications access to a Microsoft SQL Server database, you must create a login at the server level and a user within the target database. Below are instructions and example scripts demonstrating this process.

1. Create a SQL Server Login

A login grants access to the SQL Server instance. There are two common types: SQL Authentication logins and Windows Authentication logins.

a. SQL Authentication Example

-- Replace with appropriate login name and strong password
CREATE LOGIN [example_login] WITH PASSWORD = 'Str0ngP@ssw0rd!';

b. Windows Authentication Example

-- Replace with appropriate Windows domain and username
CREATE LOGIN [DOMAIN\username] FROM WINDOWS;


2. Create a User in a Specific Database

A database user links the login to permissions within a particular database.

``` USE [your_database]; -- Replace with your database name GO

-- For SQL Authentication login: CREATE USER [example_user] FOR LOGIN [example_login];

-- For Windows Authentication login: -- CREATE USER [DOMAIN\username] FOR LOGIN [DOMAIN\username]; ```


3. Grant Roles or Permissions (Optional)

Often, you must assign database roles to the new user. For example, to grant

db_datareader
and
db_datawriter
:

ALTER ROLE db_datareader ADD MEMBER [example_user];
ALTER ROLE db_datawriter ADD MEMBER [example_user];


4. Verify the New User

List the users in the current database:

SELECT name, type_desc FROM sys.database_principals WHERE type IN ('S', 'U', 'G');


Note:
Only users with sufficient privileges (such as members of the

sysadmin
or
securityadmin
server roles) can perform these actions. Change the examples to match your security policies and naming conventions.

Checking the JDBC driver

The Microsoft SQL Server JDBC driver comes pre-installed with Matillion Data Productivity Cloud—there’s no need to download or install it yourself. If you want to check which version of the JDBC driver is currently available, detailed instructions are provided here: Database driver versions documentation.

To confirm that the JDBC driver for Microsoft SQL Server is present and available for use, follow these steps:

  1. In Matillion, create a new Orchestration pipeline in your workspace.
  2. Add a Database Query component to the pipeline from the component panel.
  3. Open the properties of the Database Query component.
  4. Locate the Database Type dropdown menu.
  5. Check that Microsoft SQL Server appears as an option in this list.

If you see Microsoft SQL Server listed, the JDBC driver is already installed and ready to use in your data workflows.

Checking network connectivity

To ensure successful connectivity between Matillion Data Productivity Cloud and your Microsoft SQL Server database, you must configure the database to allow incoming connections from the appropriate sources. The configuration depends on your deployment type:

DNS Name Resolution:
If your Microsoft SQL Server database is referenced using its DNS hostname (rather than just an IP address), ensure that the Matillion Full SaaS or Hybrid SaaS agent has the required network permissions to resolve this DNS address. The agent must be able to successfully query and resolve the hostname to connect to your SQL Server instance.

Querying Data from Microsoft SQL Server

This document outlines how to query data from a Microsoft SQL Server database, with practical examples of SELECT statements, guidance on handling datatype conversion when targeting Snowflake, and best practices for initial and incremental data loads.


Example SQL SELECT Statements

Below are common examples of SQL Server queries you might use with a Database Query component:

Simple SELECT

SELECT *
FROM Sales.Orders;

SELECT with Specific Columns

SELECT OrderID, CustomerName, OrderDate, TotalAmount
FROM Sales.Orders;

SELECT with WHERE Filter (for Incremental Load)

SELECT *
FROM Sales.Orders
WHERE LastModifiedDate > '2024-06-01 00:00:00';

SELECT Using Aggregate Functions

SELECT CustomerID, COUNT(*) AS OrderCount
FROM Sales.Orders
GROUP BY CustomerID;


Handling Datatype Conversion: SQL Server to Snowflake

When querying a Microsoft SQL Server database with the intention to load the result set into Snowflake, be mindful of datatype differences. For example:

SQL Server Snowflake Notes
DATETIME
TIMESTAMP_NTZ
Timezone consideration may apply
BIT
BOOLEAN
UNIQUEIDENTIFIER
VARCHAR
Requires explicit casting
NVARCHAR
VARCHAR
Unicode strings
MONEY
NUMBER(19,4)
Numeric precision differences

Tip: Use explicit type casts in your SQL to ensure compatibility:

SELECT
  CAST(OrderID AS VARCHAR(50)) AS OrderID_Str,
  CONVERT(VARCHAR(36), UniqueIdentifierColumn) AS UID_Str
FROM Sales.Orders;


Load Patterns: Initial vs. Incremental

A recommended best practice for data replication is to use a pattern of a once-off initial load, followed by ongoing incremental loads. Refer to the Incremental Load Data Replication Strategy for detai​ls.

Database Query Component Usage Patterns:

Load Type SQL WHERE Filter? Description
Initial Load No filter clause Loads the entire dataset. Example:
SELECT * FROM dbo.TableName;
Incremental Load Filtered by key column Loads only new/updated records. Example:
SELECT * FROM dbo.TableName WHERE ModifiedDate > ?;

Example:

  • Initial Load:

SELECT *
    FROM dbo.TableName;

  • Incremental Load:

SELECT *
    FROM dbo.TableName
    WHERE ModifiedDate > '2024-06-01 00:00:00';

Use the same Database Query component for both patterns; adjust only the filter condition to differentiate between a full (initial) and incremental data load.

Data Integration Architecture

Loading data in advance of integration exemplifies the "divide and conquer" strategy by splitting the data processing workflow into two distinct steps: first, loading raw data into the Snowflake environment, and then performing integration and transformation tasks directly within the database. This sequencing is a core advantage of the Extract, Load, and Transform (ELT) architecture, as it enables organizations to readily adapt to changing business requirements while optimizing resource utilization. Data integration inherently involves transformation, and employing dedicated data transformation pipelines is the most effective way to standardize, cleanse, and enrich disparate data sources before unifying them. With ELT, these transformations and integrations happen within the target Snowflake database itself, delivering significant benefits: transformations are executed quickly and can be triggered on demand, thanks to Snowflake’s inherent scalability. Moreover, this in-database processing model eliminates the need for additional data processing infrastructure, reducing operational complexity and costs.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.