Data modeling
Key Takeaways:
- Data modeling is a process where an organization transforms data from its raw form into structured tables so the data can be used across business systems.
- There are three primary data model types, which can also be considered data modeling stages. Each model builds on the other to turn a general data strategy concept into a real-world representation.
- Data analysts, data engineers, data scientists, and database administrators can all drive a data modeling process and will play a key role in developing or implementing the model.
Table of Contents
What is data modeling?
Data modeling is a process where an organization transforms data from its raw form into structured tables so the data can be used across business systems.
Data tables are similar to spreadsheets. Table columns are known as keys, which can link to another table. A primary key is a unique identifier for the table and cannot be duplicated. Foreign keys are used to join to another table’s primary key. Consider this example:
- If an organization is creating a customer table, a “Customer ID” column will be the primary key.
- Another column, “Address,” could contain duplicate addresses for different customers if they all live in the same household. This is a foreign key.
Data transformation is typically performed with structured query language (SQL) or tools that generate SQL. Business logic is also expressed in the SQL code, which adds columns to the data tables so the tables can link. The modeled tables are cleaned, aggregated, and joined with other data.
Data modeling involves several steps, which can include creating a visual representation of a system, its data, and how that data is structured and used. The visualizations use symbols, text, and diagrams to show the relationships between data.
There are several data modeling techniques that address specific business needs or are tailored for systems of varying complexity.
Teams can use data modeling for existing systems or to plan a new system from scratch.
Why do companies use data modeling?
As organizations ingest data from different sources, those source systems often contain unclean data or so much data that the organization needs to aggregate to improve performance.
Data modeling helps an organization transform data from its raw, ingested form into a state where it can be used for downstream systems or AI models.
This process makes it easier for teams to audit their data and identify how to better use customer data as it flows through the organization. The process can also improve business reporting and analytics capabilities, like in this example:
- An e-commerce store uses Shopify and needs to ingest the data into the data warehouse
- The store also needs its call center data to merge with the Shopify data (and all other data sources)
- With data modeling, the store can combine the call center and Shopify data to align metrics from both systems
- This allows the organization to enable proper business intelligence and marketing data activation, and power machine learning models based on complete customer data
Data modeling is especially helpful when a team is:
- Implementing a new database
- Updating or rebuilding an existing application
- Considering a new technology investment
- Implementing predictive modeling in marketing
- Auditing its data governance practices
- Optimizing its customer data activation strategy
Data modeling and the modern customer data stack
Marketing teams need a solid data management strategy so they can use data to deliver the personalized experiences that customers expect. Holistic and accurate business data is also critical for helping organizational leaders make informed business decisions.
To accomplish this, marketers rely on a modern customer data stack, which includes data modeling as a primary component. This data stack features:
- Data storage - A data warehouse or data lake is the center of the modern customer data stack, collecting and storing data from across an organization.
- Data ingestion - Data ingestion layers or tools move customer data into the warehouse from its various sources.
- Data modeling - Once data is moved into an organization’s data warehouse, the team can model the data.
- Visualization layer - Once you centralize data in the data warehouse, you can unlock insights through visualizations and the ability to run predictive analytics. For visualization, you can use Looker, PowerBI, DataStudio, or any of the other business intelligence tools.
- Activation layer - This typically consists of a composable CDP, technology that sits on top of the clean, modeled data in the cloud data warehouse. The composable CDP creates audiences from the customer data and activates it across various marketing, sales, and advertising channels.
What are the types of data models?
There are three primary data model types, which can also be considered data modeling stages. Each model builds on the other to turn a general data strategy concept into a real-world representation:
Conceptual data model
A conceptual data model, sometimes called a domain model, views a system from a big-picture business standpoint. Conceptual models are fairly abstract and should be designed to solve the business need a system should fulfill and the required entities to accomplish that goal.
Marketers, for example, could create a conceptual model of how a customer relationship management (CRM) tool could help the organization achieve its business goals. The conceptual model will identify the different entities that rely on the CRM or inform its data, including marketing, sales, and customer support or success teams. Other entities can include customers and the channels that marketers use to engage them. Arrows will connect each entity to show all connections, like how data flows from customers to their engagement channels to the CRM.
Logical data model
Logical data models build on conceptual models to specify the type, format, or structure of data within each entity and how different datasets interact.
Continuing on the CRM example, a logical data model will identify the specific data within each entity or the types of data the entity needs to deliver on its goals. The model will focus on the CRM, the data it will need to provide to support the teams that rely on it, and how it will source that data.
An organization’s point of sale, for example, will provide transaction data like the total purchase amount, number of items in the order, items or services sold, and the customer’s membership number (if provided). Specific marketing channels will require customer data, like how email marketing requires a customer’s email address and location to deliver relevant messages.
The team will work through all entities to identify what data they can provide or require and use arrows or other graphics to visualize how the entities rely on each other.
Physical data model
Physical data models draw from logical data models so data teams can build a database. A physical data model provides the structure for where and how data will be stored and the database management system elements.
By this stage, each entity will include tables with rows and columns of data types. The organization’s data team will now establish identifiers for records in each table. A primary key, such as ‘CustomerID’, will provide an identifier for specific data in a table. A foreign key is a column or set of columns that links data in different tables. These keys complete the picture and explain how data across entities will interact and connect.
Once the data team completes the physical data model, they have a blueprint to build the CRM (or customize a paid solution).
What are data modeling techniques?
Organizations can take a few different approaches to data modeling depending on the complexity of the system and whether data is dependent on other types of data.
Six data modeling techniques to consider include:
- Dimensional data modeling - Dimensional data models often look like a star or snowflake. There is a central entity, a facts table with measurable events, that connects to other entities, which are dimensions that are the context for those facts. For example, marketers could create a dimensional data model to help them run a report on a recent campaign. The facts table will identify the key success metrics, and each data type will include a foreign key to link to a separate dimension table, which could include customer dimensions or campaign dimensions and the specific data types.
- Entity-relationship (ER) data modeling - Entity relationship data modeling is widely used for creating the conceptional design of a database. This model represents entities in a database as tables and defines how each entity interacts with each other. A marketing team could create an ER model to map a CRM, with tables for customer, contact, activity, and product. Each table will identify the data within the entity.
- Hierarchical data modeling - Hierarchical data modeling connects one parent record to a child record, and each child can have only one parent. These models can be helpful when working with a few data points — like mapping an organization’s leadership hierarchy — but are limiting for datasets with multiple connections.
- Network data modeling - Network data modeling follows a hierarchical data model but allows child records to connect to multiple parent records, which represents more complex data relationships. A network data model could visualize an organization’s referral program, showing how individual customers are connected to each other. This could identify highly influential customers who could be an asset to the marketing team.
- Object-oriented data modeling - Object-oriented data models organize data as objects, with each object containing data and behaviors. These models are helpful for especially complex datasets, like if a marketing team is measuring the success of its social media content. An object-oriented data model could map objects like users, posts, and ads, with data within each. The model could show how a campaign performed, including data on individual posts, notable influencers or target customers who engaged with the campaign, and performance metrics like conversions or click-throughs.
- Relational data modeling - Relational data models are tailored for relational database management systems like MySQL, PostgreSQL, and Oracle Database. These models use tables to organize data, with each table representing an entity. Rows and columns within each table contain unique identifiers, known as primary keys, and the relationships between tables are built with foreign keys. These are a common tool for implementing a CRM.
The data tables can be structured or optimized in several ways, such as:
- First normal form (1NF) - A table is in 1NF when it contains only unique values in each row and column. A primary key will identify each set of related data.
- Second normal form (2NF) - A table is in 2NF table when it meets the requirements of 1NF and removes subsets of data that appear in multiple rows and creates separate tables for them, which use foreign keys to establish relationships.
- Third normal form (3NF) - A table is in 3NF when it meets the requirements of 2NF and removes columns that are not fully dependent on the primary key.
- One big table (OBT) - All data is stored in a single, large table.
- Normalization - Organizes a database to reduce redundancies and ensure data is stored in a logical and structured way.
- Denormalization - Intentionally introduces data redundancies by combining tables or adding redundant data. Denormalization can improve read operations but may slow down write operations.
- Star schema - A central facts table is surrounded by related dimensional tables. The diagram of the schema resembles a star.
- Snowflake schema - A complex version of a star schema, which involves normalizing dimension tables into multiple related tables. The diagram of the schema resembles a snowflake.
Data modeling process
Data modeling can require a significant investment of time and resources; however, it is a necessary process for using an organization’s data to its fullest potential.
The data modeling process typically includes the following steps:
- Gather requirements - Interview business executives, stakeholders, and end users to understand what a system, like a CRM or customer data platform (CDP), should accomplish. In the case of a CDP, a requirement would be to accurately track customer engagement across channels.
- Audit existing system data - If editing an existing system, audit the system to identify its existing data and structure.
- Develop the data model - Create the data model, either using SQL or a data modeling tool. This can include conceptual data modeling, logical data modeling, or physical data modeling.
- Implementation - Use the data model to create the actual database and establish security measures. This step may include populating the database with initial data.
- Validation - Review the data model to confirm it meets the established requirements, and audit processes to ensure data integrity is preserved as it moves through the system. Schedule automated data quality check-ins.
- Documentation - Thoroughly document the data model, including its schema, tables, columns, keys, relationships, and an explanation of the business rules and data definitions.
- Ongoing refinement - Regularly revisit the data model and adjust it to reflect new business goals or technologies.
Data modeling challenges
Data modeling can be a difficult process, especially when modeling complex systems with many data entities, formats, and dependencies or if stakeholders have competing priorities for the business systems.
In general, marketing and data teams should prepare to address these common data modeling challenges:
- Uncovering business requirements - The first data modeling step can be tricky when it is difficult to understand and capture the business requirements from various stakeholders. If the business requirements are misunderstood or incomplete, the data model will likely fail to fulfill its intended goal.
- Balancing clarity with detail - Data models should be easily interpreted by the necessary stakeholders or data professionals. It can be challenging to balance simplicity with the appropriate amount of detail.
- Standardizing data - Data is often stored in inconsistent formats across systems, which can make it difficult to integrate data from multiple sources. If data quality issues persist, then the business systems will act on inaccurate or incomplete data.
- Technology selection - Many tools exist to help organizations with data modeling. However, teams often do not understand the features and capabilities they need until they have experience with data modeling. Further, budget restraints may require a team to rely on mostly manual data modeling methods, or make calculated decisions about which tool will deliver the best return on their investment.
Data modeling examples
Organizations can use data modeling to help them achieve various business goals, such as the following examples:
Retail data modeling example
A retailer wants to increase customer purchase frequency by using customer data to deliver personalized product recommendations and offers. To do this, its team must develop a data model with a clear relationship between the customer, product, and transaction data.
The organization first models its existing architecture and then creates a conceptual data model of how the architecture should function. Subsequent models will provide the blueprint for the organization to integrate customer data from all sources and create a single, trusted customer profile that includes their product preferences to inform all outreach.
Telecom data modeling example
An internet provider aims to improve customer retention. To support this goal, its team can conduct data modeling and develop a plan to provide insights into customers who might churn based on their activity and interactions.
Data modeling will identify what data exists in the existing customer profiles and the data needed to predict customer churn, such as recent survey scores, service disruptions, or decreased internet use. Models will offer a roadmap for building systems that deliver predictive insights on customer churn.
Finance data modeling example
A national bank seeks to decrease the time it takes to pull reports for regulatory compliance. Its data team can build a data model to optimize how it structures customer and business data so the organization can quickly run reports that deliver exactly what it needs to prove compliance.
What teams and roles are involved with data modeling?
Data modeling requires input and expertise from several departments in an organization.
- Data analysts, data engineers, data scientists, and database administrators can all drive a data modeling process and will play a key role in developing or implementing the model.
- Executives and business analysts offer key insights to inform the business requirements for a system, which is the first step in data modeling.
- Marketers can work alongside data teams to understand how customer data flows across the various marketing tools and identify ways to optimize their use of customer data.
Data modeling tools and software
Organizations can select from a range of data modeling tools and software. When assessing data modeling solutions, explore which platforms or databases the solution supports, how it enables collaboration, and which types of data modeling it can complete.
Outside of solutions from large brands like IBM’s InfoSphere Data Architect and Oracle’s SQL Developer Data Modeler, teams can consider adopting data modeling tools like the following:
- Archi is a popular open source modeling toolkit with robust help guides and tutorials that walk users through the data modeling process. Archi is fully cross-platform and offers a visualizer, sketch view, and Canvas Modelling Toolkit.
- DbSchema is a leading database design and management tool that can reverse engineer an existing database schema and generate SQL scripts to create or modify the database.
- dbt is one of the most popular tools for data modeling, helping teams accelerate their data development processes and make data transformation easier.
- ER/Studio is a data modeling tool from Idera that works with a number of databases and offers several integrations to enable logical and physical data modeling, conceptual modeling, and a repository.
- erwin Data Modeler is a data modeling tool from Quest that can find, visualize, design, and standardize enterprise data assets.
- Lucidchart is an intelligent diagramming application that offers templates and simplified design interfaces to visualize the data modeling.
- SqlDBM is a cloud-based data modeling solution built for enterprises with transformational modeling, data governance, and Snowflake schema monitoring capabilities.
More from the University
Looking for guidance on your Data Warehouse?
Supercharge your favorite marketing and sales tools with intelligent customer audiences built in BigQuery, Snowflake, or Redshift.