Connecting Matillion ETL to an Azure Synapse Analytics Dedicated SQL Pool: A Guide
This guide is a walk-through of how to connect Matillion ETL to an Azure Synapse Analytics dedicated SQL pool (formerly known as “SQL DW”).
In Matillion ETL, the metadata for connecting to Azure Synapse Analytics is held in an artifact known as an Environment. Matillion ETL Environments can also hold additional information that is used during data extraction and loading.
Prerequisites for connecting to an Azure Synapse Analytics Dedicated SQL Pool
The prerequisites for connecting Matillion ETL to an Azure Synapse Analytics Dedicated SQL Pool are:
- Access to Matillion ETL for Azure Synapse
- Permission to create and edit Environments in Matillion ETL
- Your Matillion ETL instance has network access to the fully qualified SQL server name of your dedicated SQL pool
- Your dedicated SQL pool is using SQL authentication
- Your dedicated SQL pool is online (not paused)
- Azure Synapse Analytics credentials:
- The cluster server name
- TCP/IP port number (usually 1433)
- Azure Synapse Analytics username and password
- Database and Schema name
Azure Synapse Analytics identifiers
Microsoft Azure data warehouse dedicated SQL pool instances are created within an Azure resource group, in a logical SQL server, and associated with a Microsoft Azure Subscription.
Every dedicated SQL pool instance has a network address known as a Server name – or sometimes an “Endpoint” – that is used for all communication. Server names look like this:
To find your server name, log into the Azure portal and click on Azure Synapse Analytics. Find your SQL pool from the list and click into it. The server name is shown in the overview panel like this:
Dedicated SQL pool instances accept connections on a specific TCP/IP port number, usually 1433. To find your TCP/IP port number, go to the Connection Strings panel of your SQL pool, switch to the JDBC view, and look for the port number part of the URL:
In the screenshot above, the TCP/IP port number is 1433.
How to connect a Matillion ETL Environment to an Azure Synapse Analytics dedicated SQL pool
To create or edit a Matillion ETL Environment, start from the main Project menu, or from the context menu in the Environments panel.
This action will pop up a Create or an Edit Environment window containing three dialogs.
In the first dialog, choose a name for the Environment, and optionally set any AWS, Azure, or GCP credentials that you intend to use.
In the second dialog, set the fields as follows:
- Endpoint – the Server name, as described in the previous section
- Port – the TCP/IP port number, as described in the previous section
- Username – your dedicated SQL pool username
- Password – the user’s password, using the Matillion password manager
You may also choose to add Advanced Connection Settings to set individual JDBC driver connection parameters. In most cases you will not need to use Advanced Connection Settings.
Press Next to enter the third dialog. While you are completing the details, make certain of two things:
- The dropdown lists for the default database and schema both show values that you can select.
- After completing the defaults, you get a success message when you press the Test button
If any of the dropdown lists are empty, you will need to go back to the previous screen and review the settings.
- If the Default Database dropdown shows a “Loading…” message for many seconds before eventually going blank, it is likely that your Endpoint and Port settings are not correct, or that your Matillion ETL instance does not have network access to the fully qualified SQL server name of your dedicated SQL pool.
- If the dropdown lists go blank immediately, it is likely that your login credentials are not correct.
Press the Test button and make sure you see a success message before finishing the Environment Setup dialog
Connectivity metamodel for Matillion ETL and Azure Synapse Analytics
Matillion ETL stores Azure Synapse Analytics connection details inside an Environment object. The Environment contains the cluster server name, its TCP/IP port number, and the login credentials as a username and a managed password.
Environments can optionally also hold additional cloud provider information, in the form of AWS, Azure, and GCP credentials. This information is used by various orchestration components such as the Data Transfer.
Matillion ETL Environments exist within a Project and a Group. Orchestration and Transformation Jobs are versioned within Matillion ETL projects. Jobs always execute within the context of exactly one Environment.
For Azure, the core administrative entity is the Account, also known as a Subscription. You create Resource Groups within an Account, and all further services – such as dedicated SQL pools – exist within a Resource Group.
Every Azure Account is associated with a cloud hosted version of Active Directory, known as a “tenant” or a “directory”. Your Azure Active Directory instance is uniquely identified by a Tenant ID GUID. For authentication purposes, you can create App Registrations within a Tenant, and have Secret Keys allocated to the Apps.
Within dedicated SQL pools, you can create Users, Databases, and Schemas. Note that Users created within a Dedicated SQL Pool are unrelated to Users that exist in your Azure Active Directory Tenant.
Once you’re connected Matillion ETL and your dedicated SQL pool
You’re ready to start building workflows in Matillion ETL.
Learn more about modern data analytics platforms
To learn more about building and optimizing a modern enterprise data stack, download our ebook, Building a Data Analytics Platform.