Identifying a rogue schedule in Matillion ETL – Part 2
Following on from our earlier blog article on Identifying a rogue schedule in Matillion ETL we have had a few customers reach out to us. One was a particularly interesting use case which we have recreated here.
Identifying a Rogue Schedule
We had some important data stored in an Amazon S3 bucket that we regularly use in our demonstration. However, this data was being deleted from the S3 bucket. We checked the schedules but could not find one that was responsible for deleting the data. Because the data is held in an Amazon S3 bucket without logging, there was no information about when the data was being deleted.
Investigating the Rogue Schedule
In order to determine the cause, we set up a lambda function to monitor the Amazon S3 bucket so that when a file was deleted a message was put on the SQS queue for Matillion to pick up and repopulate the data:
This would also give us the time when the file was deleted.
We left this to run for a couple of weeks and came back to review the results.
The first step was to get the details of when the jobs ran. We did this using the API Query component to pull the Run History Summary from the Matillion REST API:
This wrote the results to an Amazon Redshift table which we could analyse in a Transformation job:
In the job history we identified a job called ‘S3 Copy’ that was being ran before our Copy Data Back job was ran. Thus, we identified the job causing the issue!
Looking at this job, we could see it was running a file iterator over a Bash Script component.
The file iterator was able to identify all files that exist in an Amazon S3 Bucket and put the filename into a variable. The Bash Script component was then running within the iterator and calling an AWS command to move the data:
As this is a move command, the data in the source bucket was removed. Hence, the files appeared to be deleted as they disappeared.
Resolving the Rogue Schedule
To resolve the issue, we modified the ‘move’ to a ‘copy’ command to retain the existing files. The files were then loaded into Amazon Redshift using the scheduled Matillion job:
This resolved the issue for our customer. We reminded the user who wrote the job to make copies of the data rather than move it. We also suggested to the customer that they set up logging for key Amazon S3 buckets!