Private, unified, and clean – Best practices for transforming customer data
Matillion Data Builder Series
This is the second instalment of our Data Builder Series on Customer Data Platforms. Our goal is to help you explore customer data opportunities, overcome data usability challenges, and equip you with the knowledge to maximize data productivity. Here, you can read the first blog in this series, focusing on data management models and loading data into a customer data platform.
You’ve got all your customer data in the cloud - Now what?
You have seen the light and decided to build a customer data platform (CDP) in the cloud. You’ve picked your cloud data platform/warehouse, identified all relevant data sources, and used a state-of-the-art data integration solution like Matillion Data Productivity Cloud to get all your raw data there. Now what?
The next step, likely the most critical component of any customer data strategy, is to transform your customer data – coming from many sources – into a unified format for reporting and analysis. You must consider three critical aspects: ensuring personally identifiable information (PII) privacy, creating a unified customer profile, and cleansing the data to ensure accuracy and relevance. This will, in turn, enable you to utilize customer information better and get value from it — understand how your business is performing and make the right decisions to improve customer engagement, experience, and returns.
Protect your data – Personal customer data is private
Personally Identifiable Information (PII) is data used to identify a person, such as a name, date of birth, social security number, driver's license number, credit card information, or other similar information. Organizations that collect PII data must keep the information private and limit access to that data. A large chunk of customer data falls under PII, and your CDP must protect such data adequately by restricting access and masking.
Limit access - The fewer, the better
Access to PII should be limited, with full access granted only to those that need it. In addition, all persons with access to PII data should attend compliance training and receive applicable certifications – HIPAA, PCI, etc., depending on the type of business. Finally, an organization should have proper documentation to ramp up new people accessing PII.
Matillion helps organizations get proper access control to PII data. You can set up granular access to data based on virtually anything a user can do with Matillion, including permissions based on projects, data pipelines, or tables. Matillion can also mimic the data permissions of the target, i.e., your cloud data platform.
Hide your PII Data – Masking or tokenization
In addition to access control, you must mask or hide PII data within your CDP in cases where limited access is needed. The simplest option, applicable when a person needs anonymized customer information – for example, to analyze buying habits of particular customer demographics, is to run a transformation job that strips (PII) data from a table stored in your cloud data platform. You can efficiently hide your PII data with Matillion, as shown below.
Figure 1 - Removing columns with PII information from a table with Matillion.
When greater access to PII is needed, you can add a quarantine schema to the data in your cloud data platform. A quarantine schema can restrict access or mask the data based on rules in the schema — by data type, data user role, or any other rules you set up — while the PII data is at rest. Limiting access to the quarantine schema is especially important as it is for data at rest.
Another option is to leave the tables containing PII data unchanged but mask the data in transit. Under this option, the original tables with the PII data are not masked. When the data is queried by a user who should not have access to PII data, the columns with PII data are masked in transit so the user is only given access to non-PII data on their end. Snowflake’s Dynamic Data Masking capability is an example of such data masking in transit when PII data is queried directly from Snowflake by a data user. Figure 2 - Example of Dynamic Data Masking with Snowflake.
Under either circumstance, the data can be partially hidden so that PII is not revealed — but the user can still do their job. For example, all but the last four digits of a customer’s social security number or credit card info are on file and masked — enabling a customer service rep to verify identity or payment without having access to PII data. Matillion supports both quarantine schemas and dynamic data masking.
Another option to protect PII data is to mask it in flight from the data source before it reaches your cloud data platform. Several software vendors can tokenize or mask the data in flight, and only the person(s) with the token can then unmask the data at its destination. Matillion integrates and supports several such solutions, including ALTR.
Connect your data – Keys can open more than doors
Once you have ensured proper protection of PII data in your CDP, the next step is to create a unified customer profile by consolidating customer data from multiple sources into a single master customer record. Now you can understand your customer base better, identify customer segments, track customer journeys, and improve customer experiences and returns.
Key structuring is an effective method of connecting your data. For example, keys join data from multiple tables in relational databases like the cloud data platform where your CDP data is stored. Combining customer data from different sources creates a single master customer record for each customer.
A key that joins columns from multiple tables can be in the form of a single column that you can create yourself by adding a column to a table and assigning values to that new column for each row of the table – usually sequentially. Alternatively, you can rely on natural keys, a combination of columns of existing data, for example, a customer’s name, address, and phone number, that uniquely identify a record or row when combined.
Figure 3 - Example of how keys join customer info with purchases of particular products
Matillion can help you create and map keys across different tables — making joining data from those tables easy. Then, you can transform that data to create unique customer profiles that can be consumed by the business – e.g., in a customer service portal or your business intelligence and analytics applications. Figure 4 - Example of Matillion’s join function to connect customer demographics with orders.
Once you create a unified profile for each customer, use the profiles to improve your understanding of past customer behavior, needs, and preferences — delivering increased customer loyalty and satisfaction.
Cleanse your data – Good data leads to good results
Data cleansing is crucial to maintaining quality data in a customer data platform. We’ve all heard the saying “garbage in, garbage out.” Your CDP will have minimal value if the data going into it is not good. Proper cleansing will make your CDP accurate, consistent, and complete.
Follow these tips to ensure that the data in your CDP is accurate, consistent, and complete. Some of the most common ones include:
- Check integrity to ensure the data is in the proper format for its intended purpose.
- Example 1: For a healthcare provider, check and validate that the codes for procedures performed on a patient adhere to a list of acceptable codes, and if not, reject the records with invalid codes.
- Example 2: For a retail business, check that the product codes for purchases made are valid from the company’s inventory of products.
- Use Filtering to sift out unwanted data, such as duplicate records or records with missing fields.
- Format dates, numbers, etc., or standardize text to ensure consistency in data gathered from different sources.
Matillion automates the process
Matillion makes it easy to cleanse your data and quickly move from bronze (raw data) to silver (cleansed data) to gold (business and analytics-ready data). With Matillion’s Dynamic Data Standardization capabilities, you can quickly set up rules to automatically standardize data formats and identify and eliminate any outliers – missing or incomplete data and values significantly different from the majority of values in a dataset. You can also set up business rules to sift out data irrelevant to a particular analysis. Matillion also has Assert components where you can confirm the characteristics or validity of the data before moving the data along the pipeline.
The following example shows how easy it is to automate the standardization and validation of data with Matillion.
Step 1. Raw data table showing the location of customers and dates of last purchase.
Step 2. Matillion pipeline queries raw data and applies rules to the table to standardize State Codes, State Names, and Dates. Step 3. Matillion pipeline filters the data and labels any purchases before 2020 as Null so that customer activities before 2020 will not be part of the ultimate analysis.
Result. Table with standardized State Codes, State Names, and Dates with older purchases identified as Null.
What’s next – Getting the most from your customer data
A data integration platform like Matillion can be invaluable for transforming customer data and maintaining a CDP that delivers insights into customer behavior and preferences. Matillion makes integrating data from multiple sources easy, so that you can build a unified profile for each customer with accurate, consistent, and complete data while ensuring that PII data remains private. And Matillion provides you the scalability and flexibility to grow your CDP and adapt to changing customer demographics, needs, and preferences.
The following blog in this series will focus on getting value from your customer data. It will provide a more in-depth discussion on getting your data business and analytics-ready and making that data available to the business, and also detail some use cases.
Matillioners using Matillion: Alice Tilles' Journey with Matillion & ThoughtSpot
In the constantly evolving landscape of data analytics, ...Blog
What’s New to Data Productivity Cloud?
In July of this year, Matillion introduced the Data Productivity ...Blog
Data Mesh vs. Data Fabric: Which Approach Is Right for Your Organization? Part 3
In our recent exploration, we've thoroughly analyzed two key ...