Is Data Modelling Still Important in Modern Data Architecture?

Data modeling, often seen as entity diagrams and tables, goes beyond these visuals. It's the fusion of business needs with technical feasibility. From conceptual models bridging business and tech to detailed database structures, each layer offers depth. This practice involves more than defining data types; it integrates predictive elements and strategic data placement. It demands technical expertise and architectural finesse, crafting complex problems into precise design solutions. We delve into the synergy of Data Modeling and Modern Data Architecture. From traditional databases to cloud-based solutions, we uncover how data modeling shapes adaptable, scalable architectures.

What is Data Modeling?

Data Modeling should be viewed as an integral part of the architecture within a comprehensive data solution. It outlines the process of designing data layers within this solution. While many might associate data modeling solely with creating an Entity Relationship Diagram (ERD) for a physical database, the remit of data modeling can be broader, and there are parts to the process that precedes this.

A Data Modeller's scope often includes designing a data architecture based on the initial set of business requirements - understanding the 'why' behind creating a solution in the first place. Additionally, this process includes considering non-functional requirements, such as the need to store system logs or audit trails and whether changes need to be recorded in the model.

Data models often include multiple layers; the higher levels are typically conceptual or business models. These models are used to validate requirements with non-technical stakeholders, serving as a bridge between technical solutions and business needs. Subsequently, later a physical data model might be produced, encompassing all object definitions required for database development. But even this can include more than initially thought. 

Beyond the obvious elements like Table Names, Column Names, and Data Types, a comprehensive data model might also incorporate indexes or partition schemes, predicted data volume, and considerations about whether data should be located externally or internally in the database.

The process of Data Modelling is highly technical but requires someone who can translate problem statements into technical design solutions, it is very much an architectural role.

What is Modern Data Architecture?

Data Architectures have been accelerating in their development for some time. In the early 2000s, databases were regarded as the most reliable, secure, and performant place to locate persisted data. At that time, there were primarily two general models and technologies: OLTP for transactional processing and OLAP for analytical workloads.

However, as businesses increasingly embraced digitization, their data assets surged, with a heightened demand for more comprehensive reporting and analysis of their growing data. Consequently, technologies and architectures tailored for optimizing query performance, such as columnar-based storage, partitioning, and engines handling unstructured data, gained traction. Yet, these solutions often posed challenges, being expensive, complex to maintain, and challenging to scale.

The landscape shifted as we ventured into the 2010s with the emergence of cloud providers like AWS, Azure, and GCP. Suddenly, a multitude of options for data architectures became available. Operational databases, running everything from websites to ERPs and back-office applications, found a home in the cloud alongside analytical stores. This transition offered scalability, reduced maintenance overheads, and shifted from significant capital expenditure to a consumption-based model.

Fast forward to the present, the cloud now offers an expanding array of services for data storage. Dedicated cloud data warehouses and the widespread use of SaaS applications have transformed data architectures beyond the mere design of databases. Modern data architecture requires contemplating whether data should reside in a traditional database or if another technology better suits the use case. It considers the data's source, loading mechanisms, post-persistence data utilization, and the specific requirements of analytics, reporting, and ML platforms.

Since the early stages, data movement has been essential, with ETL (Extract, Transform, Load) tools existing for a considerable time. However, as these decisions result in diverse architectures and intricate webs of data movement, these tools are increasingly central to modern data architecture.

What is the importance of data modeling to modern data architecture?

  • Design: A well-crafted data model with a clear, consistent, and well-documented design is crucial for widespread usability across diverse business teams. A comprehensible design ensures effective utilization of the data model by individuals across the organization.
  • Location: The multitude of storage options available today necessitates careful consideration of the optimal storage location for different data types. Storage decisions should not solely focus on convenience or cost-effectiveness but also align with the specific requirements and suitability of the type of data.
  • Compatibility: Increasingly, users seek to amalgamate various data sources to enrich their business insights. Data models and storage decisions must align to ensure compatibility, mitigating potential complications that may arise during data integration processes.
  • Normalization/Single View: A robust data model plays a pivotal role in an environment where different data sources often yield different answers to the same business questions. A well-constructed model consolidates disparate sources, applies weighting, decision logic, and eliminates duplications, culminating in a unified, coherent view of the data.

Types of Data Models: 

  • Conceptual Data Model: In the initial phase of comprehensive data design, this model revolves around business objects, processes, and their relationships (for instance, a customer placing an order with multiple products). Its purpose is to create understandable foundational elements and allow challenges for both business and technical users. It plays a crucial role in validating the inclusivity of all necessary business objects for a project. When projects are segmented into phases, the conceptual model often aids in visually comprehending these divisions for the delivery team. Multiple conceptual models may exist to represent different business domains.
  • Logical Data Model: Building upon the conceptual model, the logical model transforms project requirements into a comprehensive relationship diagram. This model typically encompasses entity attributes (e.g., customer name, address, order number), their cardinality, uniqueness, and relationships between entities. Data types are usually determined in basic forms (e.g., text, number, currency) at this stage.
  • Physical Data Model: Designed with intricate technical detail, the physical data model factors in the specific technology for data implementation, utilizing technology-specific data types. It may outline Data Manipulation Language features such as Primary and Foreign Keys. The style of the physical data model depends on data requirements. Operational data models, foundational for applications or websites, might adopt techniques like 3rd normal form (3NF), while Dimensional data models for reporting and analytics could use star or snowflake designs. Additionally, the physical model should consider non-functional requirements like audit specifications, performance, and sizing.

Thriving in Modern Data Landscapes: The Vital Role of Data Modelling 

Data modeling goes beyond conventional diagrams and is pivotal in modern data architecture. It shapes holistic solutions, ensuring inclusivity and flexibly adapting to evolving data landscapes. From initial business concepts to technical implementation, data modeling fosters adaptable and unified solutions across various storage options. Its reach extends across organizational layers, fostering effective use and compatibility among diverse business teams. Data modeling remains steadfast in the dynamic data landscape, gilding the path toward agile and efficient modern data architectures.