Matillion BI Key Concepts – Cloud ETL – Extract, Transform and Load

  • Matthew Scullion
  • February 6, 2013

There are two types of Business Intelligence solution.  Those that use a proper Data Warehouse, and those that don’t.

In our opinion, a Data Warehouse is key to a successful Business Intelligence implementation.  It allows you to join data together from different sources and put all it into one place – one version of the truth.  You can organise the data to make it easy to work with.  You can apply rules and calculations to the data before it enters the Data Warehouse, meaning when you come to report over it, the answers are already there.  Finally, you can use business rules and other funky techniques, to “clean up and enrich” data – plugging gaps, fixing problems and adding value.

The data warehouse then, becomes a useful, up-to-date, accurate repository of information, rather than silos of disjointed data that requires processing to make it useful.

But actually, whilst the Data Warehouse takes a lot of the glory, it’s something else that does all of the hard work – ETL.

ETL – or “Extract, Transform and Load” – is the software layer that takes raw business data and turns it into useful information (which it then stores in the data warehouse).

Matillion BI Key Concepts - ETL

Matillion BI’s ETL functionality is used by our Implementation team (and for live customers, by our Customer Success team) to map your business data from your various source systems into the Matillion Data Warehouse we create for you.

It has some really clever tricks up its sleeve to allow us to deliver powerful, useful Business Intelligence and self-serve reporting, quickly and cost effectively.

It can get data from almost anywhere

Matillion’s ETL layer can talk to virtually any data source.  We support:

  • 27 different relational database technologies including MS SQL, Oracle, Informix, DB2, Sybase, Progress, MySQL and many other more exotic databases
  • Access to file formats such as CSV, Microsoft Excel, XML and flat files
  • API access to a range of major systems and Java/.Net APIs to use when we’re struggling to get at your data any other way
  • An ODBC connector, allowing us to talk to smaller or older systems which support this type of driver

We regularly integrate with all the modern systems like MS Dynamics, Epicor, SAP, Infor, Sage and IFS.  But we also integrate with bespoke and in-house written systems or older, proprietary systems built in now obsolete technologies.

Our ETL layer allows us to do this.

It can join data together from multiple sources and systems

Group of companies?  Have 9 different ERP systems in 6 different countries?  No problem.  Our ETL technology can talk to all of them and join the data together for you.

Not all of your data is in your ERP?  Have budgets, forecasts, product/customer information or rebate/retro rules in spreadsheets.  That’s fine.  We can read from the core systems and from the spreadsheets and join the data together.

It can turn scary data into easy-to-use data

ERP systems store data in what is called a “normalised” form.  This means that the data is spread across dozens, or hundreds of tables – each with many columns.  This is great for running a transactional system (and if you’d like to know why – drop us an email and we’ll explain 3rd normal form to you), but it’s a real pain when it comes to reporting.

Matillion BI ETL - Diagram of typical complex core system database schema
A typical core system database schema – yuck!

Just to make life interesting, ERP system developers tend to give all these tables and columns really un-useful names.  So CSTMSTR might be the name of the ‘customer’ table and it’s columns might be called things like CSTNM01.

So if you’d like to ask a perfectly sensible business question, like, “what was the total Net Value generated by sales of product group x, to customers in region y, in Q1 2012… compared to the same period in the prior year, and broken down by sales person”, you’re going to need to write a pretty complicated query to join together all the tables and columns that hold that information together.  Basically, you’re going to have to be a rocket scientist…

ETL helps us here.  It takes all the complicated data spread across hundreds of columns and tables, and “de-normalises” it.  It turns it into a simple database structure, which is already joined together and which has simple table and column names that are meaningful.

The tecchie name for this is a “star-schema” – but that doesn’t matter really.  The key thing is, it’s real simple.

Matillion BI ETL - Diagram of Matillion Star Schema
A Matillion BI “Star-Schema”. Simple and pre-joined together.

By the time we’ve put the Matillion BI front-end over this simple “shape” of data, you’re left with a list of meaningful business terms that you can drag-and-drop onto reports, without ever having to worry that the data came from 20 different tables and 3 different systems originally.

It can apply rules and calculations

Or, to say it another way – it can do the work of 1000 accountancy whizzes with big-buttoned calculators, in seconds.

Here are some examples of the sorts of rules we apply with ETL, every day here at Matillion:

  • Currency conversions – converting local currency into spot or budget reporting currencies
  • Applying rebate/discount rules – giving “true” gross margin or cost calculations
  • Turning product codes into useful terms – so if your SKU is ‘AT01657389X’ and the ‘A’ means it’s a widget, the ‘T’ means it’s made of titanium and the ‘X’ means it’s part of your “eXtreme” range, we can add all that information into the data warehouse for you
  • Cleaning up messy data – do you have a customer called ‘Bloggs and Co’, one called ‘Bloggs & Co’ and one called ‘Bloggs and Company’?  All the same customer?  ETL can sort that out
  • Working out cool stuff – want to know the average price you’re paying for a ‘size 3 gasket’?  Want that information stamped onto the product file, so you can get at it easily?  ETL can add up all the purchase orders for a year, get the average price, then append this onto the product.  Whilst it’s at it, it can stamp all the purchase orders with that average price, meaning you can compare what you’re paying against average, on an order-by-order/supplier-by-supplier basis.

What this all means is two things:

  • Lots of time saved
  • Easy, fast access to accurate, up-to-date information

So ETL is pretty cool.  It’s what turns your data into information.

You can learn more about the key concepts of Matillion BI, such as: Reporting, Analysis, Dashboards, Report Scheduling/Scheduled Reports, Mobile Access, Microsoft Excel Integration and the Matillion Data Accelerator, in other blog posts in this series.