Skip to main content

How Matillion Built a Framework for Extracting Data from Any API

In this blog post, I’m going to talk you through how we created a framework to solve a problem. I’ll start by defining the problem, talk you through the framework we built as the solution, then cut back to dive further into our approach that you can consider when creating your own frameworks.

The problem: Extracting data from myriad APIs

The first step in Matillion’s mission towards making data useful is getting the data into your cloud. The challenge for Matillion to solve is that there are copious data sources out there and their APIs can each have intricacies. Matillion needs a framework that allows data extraction from any of these data sources. This framework must be flexible, easily editable, and allow for use by the customer.

The framework: Programmatically build API calls

To extract data from any data source we first had to ask the question: what is a data source? Is it just an API or is it many APIs? Through research we found that a social media site might have an API for Ads data, another API for users data, and potentially more APIs for other purposes. This lead us to our starting point: a connector can have multiple APIs:

We define a data connector as the overall component that fetches your data using the framework.

The next question this leads us to ask is what is an API made up of? APIs have different environments (Sandbox, Production) which can have different versions (v1, v2) and those versions have their own endpoints (/invoices, /accounts) that users can call to get data. These relationships gave us the core structure on which we could build our framework:

 

 

With that core structure defined, the next thing we thought about was what we needed to make an API call and page the data: a URI, authentication, paging instructions, etc. These all felt like pieces of the puzzle that should be able to be defined at any level of the framework, but could be overridden if defined at a more specific level. Let me give an example of this:

The famous company “Example”  has an Ads API with a production environment that has two different versions and two different endpoints. One of those endpoints requires a special type of authentication. Here’s how we’d allow for that with our framework:

 

 

 

The thing to zoom in on here is at the environment level we have defined an Auth Type and the only other place this is defined is for the Ad Accounts endpoint in the V2 API. What this means is that all endpoints in the production environment will use Auth Type 1 except the Ad Accounts endpoint for V2 which has its own type of authentication. This flexibility allows us to handle atypical situations such as specific endpoints having their own authentication or way to page.

 

 

 

With that we have our framework, which we can programmatically run through to build API calls.

Building your own data connector framework:

Hopefully the above example gives you an understanding of the framework we built here at Matillion. Here are some key steps to consider when building your own:

  • Understand the domain – We undertook research and used our team’s  experience with APIs we had in the team to understand APIs, the various scenarios for connecting to APIs, what an API consists of, and what customers were looking to get from them.
  • Build out test resources – Using the knowledge we gained through this research, we curated systems and mock systems to help us test, allowing for various intricacies.
  • Define the entities of your framework – We established that a data connector was made up of APIs, environments, versions, etc.
  • Establish the relationship between these  entities – We decided to take an approach to the relationships of these entities that gave us the most flexibility in that a connector can have many APIs, APIs many environments, etc.
  • Find the specifics that you need to achieve your goal – having established the core parts of your framework, there are  likely more details that need to be added to achieve your goal. In our case, we had to figure out what an API call was made up of and how it fits in with the abstraction. For us, we decided that things like a URI could be defined at multiple levels and could be overridden if defined lower down the chain. These types of mechanics are what will define how your framework operates.
  • Repeat all of the above _ A framework is never really finished. You will continually refine and improve it. Just be wary of breaking changes and consider how any changes impact the intended design and purpose of your framework.

More to come…

There are more interesting technical concepts to dive into around how Matillion enables customers to work with APIs: Image handling alternatives, schema guessing API responses, allowing for various paging mechanisms, and more. These are all topics we will cover in further blog posts leading up to the release of the thing that ties it all together: Our Create Your Own Connector Project.

 

Read more from our engineers

 

This article is part of our series from Matillion engineers about our products, our work, and life as an engineer here. Learn more about engineering culture and the value of personal resilience from another member of our team, Kristian Epps.