Now Available On-Demand | AI. The Future of Data Engineering - Today. Not Just Tomorrow.

Watch now

Customer Data Platform Showdown: Centralized vs. Federated Data Management

Matillion Data Builder Series

Welcome to our new Data Builder Series. This series will provide a comprehensive look into the world of data productivity from the perspective of data management – what we know best at Matillion – and deliver content focusing on a particular topic, such as customer data platforms, stateless data planes, and healthcare data.  Our goal with the Data Builder Series is to help you explore the opportunities and overcome the challenges of data use and equip you with the knowledge and insights you need to maximize data productivity. So, get ready to explore the world of data productivity and unlock the potential of your data.

Why every business needs a customer data platform

The bottom line for every business is that they want to make money by increasing revenues and profits.  And the only way to do that is to have more customers spending more money on your products and services while keeping your costs down.

A customer data platform (CDP) is essential to achieving these results. Because it collects all customer information, such as customer identity and transaction history, in one place, a CDP helps you measure your success in acquiring and keeping customers. You can analyze a wide variety of metrics, such as customer acquisition costs (CAC), net dollar retention (NDR), the percentage of revenue retained from existing customers, and the overall customer life cycle (CLC). While it’s valuable to understand these metrics, they are ultimately lagging indicators that show what has happened in the past. An excellent way to lose customers is to only look at past behavior and treat them like walking wallets, assuming their behavior or preferences will remain the same.

The greater value of a CDP is the ability to collect and analyze information obtained from leading indicators that measure a customer’s satisfaction and how much value the customer is getting. These customer-focused metrics analyze behavior and preferences through things like web browsing, social media, demographics (industry vertical, location, size, etc.), and interactions with the business (support, chatbot, community activity, etc.) 

With this information, a CDP now provides a more objective way to measure the value obtained by each customer versus spend. This allows you to become more proactive when dealing with customers and react better to changing customer behaviors.

  • Make sure customers are gaining all the value they can
  • Fine-tune marketing and advertising campaigns and create more targeted recommendations 
  • Improve product development and offering strategies

Ultimately, a CDP can help businesses achieve their primary goal of increasing revenues through increased customer loyalty and improved customer service.  

Organizing data in a CDP

A common way to organize and store your data efficiently in a CDP is to deploy a star schema model.  In a star schema, different data dimensions – i.e., products, customers, time, etc. – are organized and stored in a manner that reduces duplication.  This makes it easier to filter data in a data platform like a data warehouse, a database, or a data mart and more efficient to query larger data sets.

The figure below shows an example of the star schema model in a CDP and how it can be used to analyze customer opinions of different products.

Figure 1. Star schema to measure customer opinions of particular products.

Centralized vs. Federated approaches to managing a CDP

Today, most modern enterprises rely on cloud data platforms, like Snowflake, Databricks, or Amazon Redhsift, as the storage layer and the analytics backbone for their customer data platforms.  These cloud-native platforms provide greater scalability, performance, and compatibility with BI, analytics, and governance solutions while also reducing infrastructure costs and maintenance.

A critical decision in creating your CDP is to determine how you are going to manage all of the data coming into your cloud data platform of choice. This is usually done using one of two main models.

  1. A centralized model, or 
  2. A federated model

Centralized model - One team to rule them all

In the centralized model, a single team is responsible for co-locating and integrating all the relevant sources of data into a central location – i.e., your chosen cloud data platform.  The team is also responsible for ensuring that the correct customer identity is used in all cases and that the resulting data is easily consumable. 

To make the CDP actionable, this team will likely need to push the enriched output data somewhere else – most likely into a contact center database or back to source business applications – where the data can be viewed and analyzed by business users. Accomplishing this might require sync-back capabilities for your data environment.

Figure 2. Centralized model with one team managing all data pipelines for the customer data platform.


A centralized model requires all the expertise and knowledge to exist in this one team. For example, ownership of the definitions of "customer," "industry vertical," and “products.” The same team also owns the technology stack and enforces governance. 

Federated model - Every team, responsible, at once

The federated model assigns responsibility for managing the data to different teams with domain expertise.  Instead of centralized data ownership and governance, each domain is assigned responsibility for a particular set of data based on knowledge and proficiency. So, while the customer data in a federated model, similar to a centralized approach, ultimately resides in a central cloud data platform, the responsibility for ownership and governance of different types of data resides with different teams.  

In practice, this means that the marketing team might be responsible for customer data gathered from a customer’s web browsing history or social media activity, while the sales team might be responsible for customer transaction data, and the support team is responsible for customer interaction data. A data mesh architecture, in which each domain is responsible for the data pipelines for the data they own, can be used to support a federated model.

An example of how the federated model works within a CDP can be seen below:

  • Domain Team 1 owns the definition of "customer." Every other domain team must use this single definition. There can be no argument about how to uniquely identify a "customer" and no disagreement about counts.
  • Domain Team 2 owns the definition of "industry vertical." Every other domain team must use this single definition. There can be no argument about what industry verticals we are considering. Historical reporting must use the same definitions as AI/ML predictive modeling.
  • Domain Team 1 must use Domain Team 2's definition of "industry vertical" and assign it to all the customers.
  • Domain Team 3 can now get the customer data + industry verticals and can make reports or use the data for things like AI/ML modeling reliably.

A central hub team still has a vital role in the federated model – to provide a tech stack and enforce governance. Gold/Silver/ Bronze is a sound governance methodology for this approach. Domain teams are encouraged and empowered to publish material. But if they want it to be treated as a "gold" standard, they must adhere to specific requirements.

Centralized vs. Federated - Who wins?

Centralized models of working with customer data have superior coordination and control thanks to simple internal lines of communication. The result is that decision-making can be fast. For example, if the definition of "industry vertical" needs to be changed, it can be done efficiently by the centralized team. Governance and accountability remain clear because everything is handled in just one place. Similarly, a centralized team can better take advantage of economies of scale in terms of the technology stack that underpins their work.

However, having all the expertise and knowledge in a single team also results in resource constraints. This often leads to bottlenecks, making it more challenging to scale a business's data analytics efforts. This is where the Federated model, with its distributed accountability for data products, provides greater advantages.  

With a Federated model, there is no need to wait around for a centralized team to complete various processes before making a minor change or for a massive backlog of data projects to clear before the data team gets to your report request.  Accountability for every data product rests clearly with the assigned Domain team. In addition to the obvious scalability advantages, this can motivate and empower the teams concerned and lead to quicker deployment of creative solutions.

Ultimately, the decision on which model to choose depends on the makeup of your organization and its priorities. What is more important to you? Centralized accountability and control or a more democratized approach to managing your customer data? Economies of scale with respect to your technology stack or scaling your customer analytics? Only you can decide which model best fits your organization's needs and goals. 

Getting Data Into Your CDP 

No matter which approach you choose, the first step to better understanding your customers and making the right decisions to improve customer loyalty and engagement is ensuring you have all the right data.  Any successful CDP must have access to data from a wide variety of sources to enable a business to acquire a comprehensive view of its customers and their behaviors.  The most commonly used data sources for customer data are customer relationship management (CRM) systems, email campaign solutions, website analytics tools, social media platforms, online surveys, loyalty programs, and third-party data providers. 

Matillion makes it easy

The Matillion Data Productivity Cloud makes it easy to extract and load data from all your customer data sources into the chosen cloud data platform for your CDP – whether it is a centralized approach with a single team loading the data or a federated model with various domain owners handling the task.

Matillion has built-in connectors to the most commonly used data sources and applications for customer data. 

And, with Matillion’s Low-Code / No-Code interface, you can connect to a data source in just a few seconds. Click on play on the video below to see how to connect data from Salesforce to the cloud quickly with Matillion.

Matillion also offers Change Data Capture data loading to capture incremental changes to the data and the ability to connect to any REST API with the Create Your Own Connector framework.

What’s Next – Enriching and Transforming Data in your CDP

The next blog in this series will focus on enriching the data in your CDP through data transformation. It will be a more ‘how-to’ focused blog discussing best practices for getting your customer data business-ready.

Schedule a demo to find out more about Matillion.

Andreu Pintado
Andreu Pintado