Skip to main content

Use the Metadata API to Track SQL Executed Against Snowflake (Shared Job)

 

The recent release of Matillion ETL version 1.54 introduced a new API, the Metadata API, to our Snowflake and Redshift enterprise offerings. What can you do with this new API? For a whole number of reasons (including data governance, audit logging, or just wanting so see what keeps changing your data in a table) you might be looking to answer questions such as: 

 

  • What SQL statements has Matillion executed against Snowflake today?
  • Which Matillion job(s) have appended data to my table?
  • What data is being pulled from my Salesforce system into Snowflake?

 

Using the steps below and a handy Shared Job, you can create Matillion workflows and queries that help you not only store the SQL you execute in a table (in this example, within Snowflake) but also link it to the Matillion job that created the SQL.    

 

How does the Metadata API help this happen?

 

The new API logs the majority of SQL statements generated by Matillion ETL and executed against the cloud data platform (Snowflake or Redshift) as well as the main source systems the data comes from. The API returns a JSON document for each set of SQL executed for a single operation.

 

For example, for a typical data ingestion from Snowflake, querying the API would return:

 

 

From this metadata you can see:

 

  1. Which job executed when
  2. Details of which Snowflake instance this was executed against
  3. The query run against Salesforce
  4. The COPY to Snowflake

 

In other words, it describes the lineage of how the data moves from Salesforce to Snowflake.

 

For a Transformation Job you would see a similar JSON document with the Push-Down ELT SQL operation that is executed by Matillion ETL for Snowflake to transform your data.

 

You can find further technical details about the Metadata API in our documentation (https://documentation.matillion.com/docs/9907241). But to save you time: If you’re a Matillion ETL For Snowflake user, I’ve created two artefacts and shared them via Matillion Exchange to help you create Matillion processes that can be amended and scheduled for your own needs:

 

  1. An API Profile (https://exchange.matillion.com/s/job-page?id=a074G00001ElT9UQAV) – this allows the API to be called by the Matillion ETL API Extract component
  2. A Shared Job (https://exchange.matillion.com/s/job-page?id=a074G00001ElT9eQAF) to extract, flatten and write the data to a Snowflake table in a tabular format

 

The API profile can be used by the API Query component within a Matillion ETL Orchestration Job to quickly and easily connect to the Matillion instance’s own API and bring the data into Snowflake as a VARIANT for further processing or storage in a Data Lake–without the need for hand coding. You simply specify the required API parameters graphically:

 

 

Using the Shared Job

 

The example Shared Job I’ve created takes this capability one step further, making use of the API Profile to:

 

  1. Connect to the Matillion Metadata API running on the same instance as the job
  2. Extract any new lineage information since the last time the job was executed (using a high-tide method)
  3. Store this in a Snowflake temporary table in the raw JSON form
  4. Flatten this JSON into a tabular form ready for querying with SQL

 

 

 

The result is a table in your Snowflake schema logging each SQL statement executed by Matillion for a given Project and Environment – along with which Matillion job executed the statement, what time, and which system it was executed against. 

 

As an example – take this Matillion job that stages data from Salesforce then transforms it:

 

 

After this has executed, you can run the Shared Job to see the metadata information in Snowflake–like this:

 

 

Then you can check the new table in Snowflake!:

 

SELECT
    "log_start_datetime", "log_run_type", "log_job_path", "log_operation_type", "log_sql_dialect", "log_sql_order", "log_raw_sql"
FROM 
    "mtln_lineage_log"
ORDER BY 
    "log_end_datetime" DESC
  , "log_sql_order" DESC

 

Try the Metadata API for yourself

 

This is just the start of what can be achieved with the Matillion Metadata API. Try it for yourself. And if you have your own Shared Jobs that use the Metadata API, remember to tell us about them, upload and share them on Matillion Exchange

The post Use the Metadata API to Track SQL Executed Against Snowflake (Shared Job) appeared first on Matillion.