Single Version of the Truth: Holy Grail or dangerous misnomer?
When discussing the benefits of a centralized data warehouse in a business, the hackneyed phrase “a single version of the truth” is often roiled out as one of the expected benefits. But what is meant by this, is it achievable and is it even desirable?
‘’Versions of the truth’’
Let us deal with the pedantry first. Can one have “versions” of “the truth”? Well, at any one time, if it is accurately recorded the truth is the truth. Singular. “Versions” suggests changes over time, and of course the truth (accurate data) at one instant will likely vary over time. In terms of data and data analysis, what we are really saying is that whilst it may be perfectly legitimate to interpret data in different ways and draw varying conclusions, we should be able to eliminate the time wasted on arguing about the data itself and whether different views arose because we have slightly different data. Too often time can be wasted in meetings because departments maintain their own “shadow” systems (often in MS Excel) with the data they require and use drawn from this. Let’s eliminate the duplication of effort in both maintaining these shadow systems and then reconciling the differences.
Having one version makes it easier to make decisions, but requires accurate data. Data about one transaction may come from different systems (e.g. EPOS and Inventory). You don’t want to eliminate one, rather make sure that they both accurately reflect that transaction i.e. identify and eliminate errors that cause differences. What we really want is consistent accurate data, eliminating differences that may arise due to inefficiencies in the ways the data was collected, collated, and managed. That is our “single version of the truth”.
Where business groups have evolved through an M&A roll-up, or the opening of overseas branches or subsidiaries, it is not uncommon to find that different entities, as well as using different systems, may use slightly different calculation methods, to arrive at a result for which they all use a common term such as “gross margin”? Is it calculated before or after delivery costs, allocation of rebates and marketing contributions?
Developing user trust
What will be crucial to adoption is that the users and report recipients have faith in the source, quality and accuracy of the data. Only then will they abandon their “shadow” reports.
Ensuring business users trust the data is crucial
One of the easiest ways to achieve this is by them being able to simply replicate, at the UAT stage, the reports they have previously “hand cranked”. Once they see this, trust grows. So, if the French country manager wants his report in euros with gross margin before rebates but including delivery, give it to him (it might be labelled “Gross Margin (Fr)”), even if the group wants reports in U.S Dollars with gross margin after rebates and before delivery costs. Creating an extra line, calculated automatically at the transform stage of the ETL process, and having it available for him to select is a step worth taking if it speeds adoption and eliminates the shadow system. He can be weaned over to the group’s desired data definition once he is on board with the data, the system and its use.
Benefits of a centralised data warehouse
Besides the “single version of the truth” there are other potential benefits to deploying a centralized data warehouse across a group. It may be difficult for the individual business units to justify the cost of their own data warehouse and BI solution and the IT staff to support it. The group can reduce expenditure on hardware and software, especially storage by using cloud services, IaaS and SaaS. Individual businesses will typically have had an over allocation of resources, (CPU and memory) in order to be able to meet sporadic peaks in demand. Adding acquired businesses and getting their core data available in a standardised way is also simplified by the central data warehouse. Consolidation would facilitate efficient and effective decision making with reliable and timely information.
The implementation of a single data warehouse can be seen as a competitive advantage and part of a growth strategy, enabling customer analysis, cross selling, up selling, taking market share from competitors and increasing supplier leverage.
Getting it right
Providing self-service reporting tools over this common data pool often receives much of the focus, but it is, in truth, the easy part and there are plenty of excellent solutions, both on premise and in the cloud. Getting adoption can be a politicised process. More attention should be paid to content than delivery. There will be a need to agree on some standards and naming conventions. The data warehouse can be created with its “supermarket shelves” of data that the user can select from to report over. The aim should be to use the company’s standard terminology so users do not need to worry about table names etc. Unfortunately, even within a single company users from different departments may have differing views on what the company’s standard terminology is (product, SKU, item, category, brand etc.) so a representative group should be formed to make those decisions and example tips based on the company’s actual products be available to users in the database to remove doubt (that the right data is selected)
Identify the business events which you want to report over. Hits on the website, Enquiries, Quotes Sales Orders, Sales Invoices, Purchase orders, Inventory, etc. Prioritise the order in which you would tackle these areas. (Get a quick win!) Identify the sources of data. Agree your terminology. Decide all the ways in which you might like to be able to slice and dice this data or compare data from different business events i.e. orders vs inventory.
It’s important to identify the business events you need to report over
Use available cloud technologies and solution providers to achieve this quickly and inexpensively without significant capital expenditure or a major IT project. Give your users the data they need to make better decisions and free up expensive and skilled IT, Finance and Business Analysis team members from the grind of report creation for others. Tweaks can be made. There is no end to the process. Look for a partner where ongoing changes are managed as part of the service.
A word of warning
A final word of warning; as outlined above, significant benefits can accrue from the deployment of a centralized data warehouse with accurate data from diverse sources providing the single version of the truth. In this Big Data era however, increasingly predictive analytics are run across diverse sets of data which may not lend themselves to the sort of sanitization and standardization necessary to fit in the centralized data warehouse. Indeed to do so might defeat the object of looking at this diversity of data, which we may be hoping is going to uncover the unexpected.
So we are back to another universal truth: One size never fits all!
Data Mesh vs. Data Fabric: Which Approach Is Right for Your Organization? Part 3
In our recent exploration, we've thoroughly analyzed two key ...eBooks
10 Best Practices for Maintaining Data Pipelines
Mastering Data Pipeline Maintenance: A Comprehensive GuideBeyond ...News
Matillion Adds AI Power to Pipelines with Amazon Bedrock
Data Productivity Cloud adds Amazon Bedrock to no-code generative ...