Now Available On-Demand | AI. The Future of Data Engineering - Today. Not Just Tomorrow.

Watch now

Using Stored Procedures with Matillion

Stored procedures have been a part of database technology for a long time. In a typical modern data fabric, you are likely to find stored procedures both in source databases and in your target cloud data warehouse

This article will explain exactly what stored procedures are, with examples showing how to take best advantage of them as a Matillion user.  

Prerequisites

The prerequisites for using stored procedures with Matillion are:

Plus either:

  • Privilege to run stored procedures in your source database, or
  • Privilege to create and use stored procedures in your Cloud Data Warehouse

 

What is a Stored Procedure?

Stored procedures are a high-code orchestration technology, designed to help with code re-use. Declare once: run repeatedly. 

The code inside stored procedures mostly consists of SQL statements. In addition, different database vendors offer a variety of extension features such as variables, loops, conditional branching and exception handling. 

In other words, stored procedures combine the convenience of bundling lots of related logic into a single named wrapper, plus encapsulation by allowing you to parameterize it. 

Just like SQL itself, stored procedures exist entirely inside their host database. For data processing this brings two unique advantages:

  1. Proximity to the data - which means top performance. As far as the SQL engine inside the database is concerned, it makes no difference whether a statement came from a stored procedure or not. The SQL runs just the same.
  2. Tight integration with the host database - which means taking best advantage of all the specialized features provided by the database.
     

For Matillion users there are two main cases for stored procedures:

  1. Inside a source database (such as SQL Server or Oracle), stored procedures offer the ability to get hold of data in a more sophisticated way than just running a SELECT query. When used for this purpose, stored procedures are conceptually similar to database views.
  2. Inside a cloud data warehouse, stored procedures offer ways to encapsulate orchestration and administrative tasks. This especially applies where the task is frequently repeated and is parameterized.


In both the above cases, stored procedures also provide ways to govern access to database objects. A stored procedure may be declared to run either with the privileges of its owner or with the privileges of the caller. This distinction is also sometimes called definer vs invoker rights. There will be an example later in the article.  

Stored Procedures in Source Databases

When your stored procedure exists in a source database, use a Database Query component

From a DataOps perspective, it can be helpful to verify in advance that the network route to the source database is open.

Use a Check Network Access component immediately before a Database Query

SQL Server Stored Procedures

From SQL Server you can call a stored procedure using a Database Query component in Advanced Mode. Just type the name of the stored procedure into the SQL Query property, optionally with its schema name. 

Depending on your Matillion ETL version, the sample tab may report a syntax error, but the component should nevertheless run successfully. 

SQL Server stored procedures can contain more than one SELECT statement. If there is a choice, Matillion will use the first one. 

Some versions of SQL Server permit OLEDB access to stored procedures using OPENROWSET, with a SELECT statement like this:

SELECT * FROM OPENROWSET(
'SQLOLEDB',
'Trusted_Connection=Yes;
Server=(local);
 Database=MyDatabase',
  'exec my_stored_proc) 

If OPENROWSET is enabled in your SQL Server source database, you can use the above as the SQL Query of your Matillion Database Query component.  

Oracle Stored Procedures

Oracle differentiates strongly between two kinds of stored subprograms:

  • Stored Procedures - these are statements, and they do not return data.
  • Stored Functions - these are expressions that return a value once the function has finished running. Stored functions that can return multiple records are known as pipelined functions.

Pipelined functions are the best way to read Oracle data defined by procedural logic. Switch your Matillion Database Query component into Advanced Mode, and use a SELECT with the following syntax:

SELECT * FROM TABLE(subprogram_name)

Stored Procedures in the Cloud Data Warehouse

Within a Cloud Data Warehouse, stored procedures can be useful for administrative operations in exactly the same way as Matillion's inbuilt low-code option: shared jobs.

Look for cases that involve several related SQL statements, and where the task needs to be repeated often with different parameters.

In data processing, a common example is a job audit table. Matillion requires the privilege to write to the audit table. But from a security perspective, interaction with the audit table must be carefully governed. A good solution is:

  • Create the audit table in a private database location, and grant no access to it
  • Create a stored procedure to manage the audit table, defined with Owner's rights
  • Grant Matillion users the privilege to use the stored procedure

 

Snowflake Scripting example

For an audit table with columns named run_history_id, start_ts, end_ts, status and message, this is a simple stored procedure for managing the table:

CREATE OR REPLACE PROCEDURE "${environment_database}"."${examples_schema}"."audit"(
  p_action VARCHAR,
  p_run_history_id INTEGER,
  p_message VARCHAR)
RETURNS INTEGER
LANGUAGE SQL
EXECUTE AS OWNER
AS
$$
BEGIN
    IF (p_action = 'START') THEN
        INSERT INTO "${environment_database}"."${examples_schema}"."audit_log"
            ("run_history_id", "start_ts", "status")
        VALUES (:p_run_history_id, CURRENT_TIMESTAMP(), 'STARTED');
    ELSE
        UPDATE "${environment_database}"."${examples_schema}"."audit_log"
        SET "end_ts" = CURRENT_TIMESTAMP(),
            "status" = :p_action,
            "message" = :p_message
        WHERE "run_history_id" = :p_run_history_id;
    END IF;
END;
$$
;
  

The EXECUTE AS OWNER clause declares that the SQL will run with the privileges of the database user that owns the stored procedure - rather than whichever Matillion user happens to be running it. This is a great way of giving tightly controlled access to the protected audit table. 

The START action creates a new record. When the job has finished, SUCCESS or FAILURE actions update the record and set the timing and optional error message. An implementation taking advantage of Matillion ETL's default error handling pattern would look like this:

For illustration purposes, all the audit tasks above are just SQL Script components. In production, it would be better to implement them as a Shared Job.

Audit start

As the very first task of the job, the SQL uses the run_history_id automatic variable like this to create a new audit record:

CALL "${environment_database}"."${examples_schema}"."audit"
('START', ${run_history_id}, NULL)
Audit success

On the success branch, the same run_history_id is used to close the open audit record.

Audit failure

The failure branch is almost identical, but includes the detailed_error automatic variable to record the stack trace while the audit record is closed.

It is advisable to add an End Failure component afterwards because the audit task is on the failure branch. The job should still fail overall even after the audit task succeeds.  

Next Steps

As a Matillion user, you are already taking advantage of the performance and functionality benefits of SQL executing inside the database (this is known as "ELT"). However, there may still be cases - like this article has suggested - where you could use stored procedures in your Matillion ETL orchestration and transformation jobs. 

If you are considering creating your own job auditing framework, take the Shared Jobs course on the Matillion Academy

There is more information about the Check Network Access shared job in this guide to Public and Private IP addresses.

Andreu Pintado
Andreu Pintado