Blog

Matillion Change Data Capture (CDC): How to Handle Source DDL Changes

source DDL changes: green puzzle piece

 

Change Data Capture (CDC) in Matillion is a great feature that will help keep your cloud data warehouse up to date with a source database (MySQL, PostgreSQL, Oracle, or Microsoft SQL Server). Setting it up is a fairly straightforward process. (This support article and our recent blog post about setting up CDC with an RDS MySQL source database are a few resources that can help.) But after you set up CDC, some questions may arise. A common question: How do we handle source data definition language (DDL) changes in CDC? Let’s talk about how to handle source DDL changes and apply them to the target. In addition, we can dive into modifying the CDC job to alert for failures.

DDL changes: A common cause of CDC failure

One reason that a Matillion CDC job or task can fail is because of source DDL changes. CDC in Matillion will not make those DDL changes from the source to the target. For example, if a new column is added to the source table, it will not be added to the target table in the cloud data warehouse. Once data gets loaded into that newly created column(s), the CDC task will fail.

 

The table(s) that failed can be queried in the >mat_cdc_log table (select * from mat_cdc_log where cdc_table_status <> 0) that is created in the cloud data warehouse, or by viewing and expanding the Task History in the Matillion UI. As noted in the fourth step of “Update the CDC Orchestration Job” in the next section, “Alerting When CDC Fails”, the SNS Message Component can mention the table(s) that failed by referencing the “cdc_src_table_name” variable name in the message.

 

DMS Console – Reload Table Data

 

One option to handle DDL changes in the source table is to reload the table(s) that caused the failure within the DMS console in AWS.

 

  • In the AWS DMS console, select “Data Migration Tasks”. 
  • Select the replication (CDC) task that is running on the Matillion instance.
  • Under the “Table Statistics” menu, select the table you want to reload. Note: You can see what tables had DDL changes by seeing a non-zero value in the “DDLs” column.
  • Select “Reload table data”.

CDC DDL changes: reload table data

 

  • A confirmation screen will appear; select “Confirm”

CDC DDL changes: confirm reload table data

 

Restart CDC Task in Matillion UI

If there are other errors that arise due to a DDL change in one of the source tables, it may be best to “restart” the entire CDC task. This method may not be ideal. But at this point, restarting the task will ensure that data is current in the cloud data warehouse.

 

  • From the “Project” menu in Matillion, select “Manage Change Data Capture”
  • Select the CDC task to restart 
  • Select the “Play” icon

CDC DDL changes: select play icon

 

  • A “Run CDC Task” menu will open
  • Select “Restart” to restart the task

run CDC task restart

 

To avoid restarting a task for your entire replicated table set, you may want to break up the CDC task in Matillion into multiple tasks. For example, if there are tables that will probably not have DDL changes, they can be in the same task. Another task can house tables that will likely have DDL changes. Keep in mind that each task will create a new queue/s in addition to the queues for another task. When replicating several tables on an ongoing basis, it is best to have a dedicated Matillion instance for CDC.

 

Alerting When CDC Fails

When configuring CDC and creating a new CDC task, an Orchestration Job will be created that references the CDC shared job. This Orchestration Job will have the same name as the new CDC task. When you initially set up CDC, it creates a log table in your cloud data warehouse. You can query that table to check for any failures. You can also modify the CDC job so that it emails you if a failure occurs. Use the SNS Message Component to set up the email alert, following the steps below.

 

Create an SNS Topic in AWS

 

  • In the AWS console, select “Services” and search for “SNS” (Simple Notification Service) 
  • Select “Topics” from the menu in the upper right hand corner of the console.
  • Select “Create topic”
  • Name the topic appropriately. Other configuration settings can be applied based on your requirements otherwise leave the initial settings
  • Hit the “Create topic” button

create topic in AWS

 

Create an SNS Subscription in AWS

 

  • In the AWS SNS Console, select “Subscriptions”
  • Select “Create subscription”
  • Choose your newly created Topic from the “Topic ARN” dropdown menu
  • Designate “Email” as the “Protocol”
  • Enter the appropriate email address for the “Endpoint”
  • Hit the “Create subscription” button. Note that after the subscription is created, it must be confirmed. This can be done either from the email or through the AWS SNS Subscriptions menu by selecting “Confirm subscription”

create subscription

Update the CDC Orchestration Job

 

  • Open the CDC job (same name as the CDC task) 
  • Drag an “SNS Message” component to the canvas 
  • Attach the component to the “CDC Sync To Target” component. There are 3 options when attaching. Use the red failure option so it can send a message if there is a failure 
  • Select the newly created Topic and edit the “Subject” and “Message” properties with appropriate messaging. The parameter values for the failed table name can be referenced in the message. For example: ${cdc_src_db_database}.${cdc_src_table_name} failed to load
  • Note that this can be done similarly for a successful run (green success option), but the “SNS Message” component will need to be updated with the appropriate message

 

send failure alert

 

Learn more about automation

 

To learn more about CDC and other ways that Matillion helps you automate data transformation, download our ebook, Accelerate Time to Insight.