Customer data infrastructure
Key Takeaways:
- Data infrastructure is a framework of tools and integrations that gather, process, and store data.
- CDI is a type of data infrastructure focused on customer data and tools that activate it.
- While CDI is not a marketing tool by itself, it is a technical foundation that allows other marketing tools to work.
- You should have concrete goals for what you want your CDI to accomplish before trying to set up or adjust one.
Table of Contents
What is customer data infrastructure?
Customer data infrastructure (CDI) is a specialized data infrastructure focused on gathering, processing, and storing data about customers. Data infrastructure is a framework of tools and functionalities that work with each other, rather than a single solution; in the same way that the phrase “transportation infrastructure” refers to streets and highways, railroads, gas stations, buses, bike lanes, and all the other systems that help people get places.
Your CDI is the foundation for all the data collection and data management tools that your organization uses, and acts as a “single source of truth” for all applications that use that data. A single source of truth ensures all your data comes from one place that has the most up-to-date version of that data. This means your marketing team is always working from the latest information, which is crucial for campaign success.
CDI is not a marketing tool. It’s a group of tools that takes events (alerts, actions, interactions, form inputs, etc.) from various applications and ingests them into your data storage solution. Your CDI does not handle data activation, but it does prepare your data so that it can be activated by other tools, like a composable customer data platform (CDP).
Types of customer data infrastructure
Data infrastructure is composed of different tools, mechanisms, and processes. These various elements will work together to make up your CDI.
Core CDI
The infrastructure part of a CDI is made up of several elements, some of which your organization may already have, including:
- Website - This covers assets your customers will see and interact with, front-end systems that make up your website, and any tools that collect customer behavioral data or forms filled out by website users.
- Back end - This includes the systems that take the data from your front end, and any applications, application programming interfaces (API), or other tools that accept that data to store in other locations.
- Databases - A database is a collection of organized and stored information. Most websites that collect customer data, or any kind of data, have a database to store it.
- Webhooks - Some types of API can generate a unique URL that sends relevant information to the server when a specific event occurs; this is a webhook.
- Event pipelines - The data pipelines that ingest, or take in, the data and route it into the rest of your data infrastructure.
Enhanced CDI
To be an effective foundation for your marketing team, and to protect your organization from future problems around customer data management, your CDI can also include the following:
- Schema definition and governance - These will be a list of rules for processing your data, and ensuring that it all comes into your CDI in the right formats, and according to your organization’s needs and local laws, so that data can be used by your other tools.
- Identity resolution - Tools that consolidate all of the data about a single user or customer into a cohesive data point. Note that this data is not activated during this phase.
- Personally identifiable information (PII) handling - As customers provide personal details about themselves and their lives, some of that information may fall into categories protected by various state, federal, or national laws.
Note that your CDI’s data governance should be configured to securely handle that information, or you may be at risk of violating privacy or other data laws.
What components make up customer data infrastructure?
An effective CDI setup needs at least three components: It needs sources for data, rules on how to structure that data, and ways to make that data useful. The elements that make up a CDI may be part of different tools, or can come from a single tool, but they fall into three main component categories:
Data integration
Having an omnichannel marketing strategy means you are interacting with customers on as many platforms as possible, and each of those platforms is a possible source of customer data.. Data integration is the process by which that data goes from those sources into your other data processing tools.
You want your CDI to integrate with all of the social media tools that you gather data from, as well as any website tracking tools that you use, and any other sources of usable customer data you can access.
Data governance
Data governance includes a set of rules and processes that your tools and teams need to follow. These include rules about the data and its formatting, so that all teams are using the same terminology and formats for data. With data from different sources, each piece may come in a different format. One source records first names and then last names, while another does last then first. Or, one date comes in as January 31, 2025, while another is 1/31/2025. While a human can figure out these discrepancies, software struggles to take in multiple formats or handle other discrepancies.
Data governance also includes rules about security, legal compliance, and availability of the stored data. You need your teams to be able to access the data when they need it, while preventing hackers and malicious actors from being able to access it. For some types of data, including healthcare information or PII, you may also be subject to local laws that protect the customers whose data you have. Solutions need to be built with those laws, rules, and processes in mind.
Identity resolution
After ingesting the data, and cleaning it up, your tools can begin to work with it. Identity resolution is a process of taking all of the data about a customer and turning it into a cohesive data point about that customer, which other tools can use to activate the data. Imagine all of the data we have collected so far as individual pieces of paper, identity resolution is where you collect all of the sheets about the same person and put everything into a folder with their name on it.
Identity resolution is a different process than data activation. At the end of identity resolution, we have a loose collection of facts about a person. Data activation is where those facts turn into actionable insights about customer behavior. Your identity resolution may reveal that a specific person bought cat litter, a cat toy, and cat food. Your activated data may take those pieces of information to infer that the person probably has a cat.
CDI vs. CDP vs. CRM
Despite all working with customer data in different ways, there are several key differences between customer data infrastructure, a customer data platform (CDP) and a customer relationship management (CRM) tool.
The CDP is a centralized location that stores and organizes customer data from multiple sources. Where a CDI is the framework for collecting raw data and processing it into a usable, but inactive format, a CDP tool will work with activated data to create standardized customer profiles and insights.
CRM is a process of managing interactions with customers. A CRM platform can use tracking data from the CDI about customer actions, such as purchases or and customer communications, and it can generate data about customer contact with the marketing, customer success, and sales teams
Why should I care about customer data infrastructure?
CDI provides an important, technical foundation for the rest of your marketing technology stack — it’s the system that takes your data and puts it somewhere your marketing tools can use it. Your customer data infrastructure should handle your data integration, data governance, and audience management for you.
Benefits of a customer data infrastructure
Implementing a strong CDI framework can give your entire organization several benefits that all come from well-organized and robust data, including:
- Automation and scaling - Some data infrastructure tools can automate data collection, ingestion, and processing. Set up properly, this can also scale as the business grows. You could manually gather data from your various tools, then manually enter it in your databases, but this is time-consuming and error-prone. While it can work for a dozen customers, it would be challenging with a hundred or a thousand.
- Deduplication - Your data tools could gather duplicates of the same data for any number of reasons. For example, a customer could enter the same info into a form twice, or interact with two different tools in the same way. Many CDIs include mechanisms that consolidate duplicates, or deduplicate the data.
- Single source of truth - When you’re working with different data from multiple sources, it is also easy for data to become outdated, such as when a customer changes their address. If your CDI can work as a single source of truth, all your teams are working from the latest data, so mailers won’t go to the wrong place, wasting money. Everyone is on the same page, making it easier to align customer-first efforts across all the departments that use that data and improving the overall customer experience.
- Audience management - With identity resolution providing you a single view of the customer (also called customer 360), your other tools can get the holistic view of your customer data needed to generate customer insights. If a customer reaches out via social media, the data captured from those tools can be combined with other data from their profile, like order history, which gives the marketing team insights into their preferences, and the customer service team insights into the products they own.
All of these work to save your teams time and money by reducing the time spent searching for the latest data because it’s been consolidated and updated. With the right automation tools, the data should import automatically, so nobody has to fix or collect that data. Better data also leads to better decisions, because you know that the information you’re basing those decisions on is accurate and timely.
Why marketers should care about CDI
Most of the components of any data infrastructure are tools and processes that fall under other teams, such as the data or product infrastructure team. However, the parts of that infrastructure that work for CDI applications can enhance your marketing automation tools, giving you a lot to work with. Some marketing use cases for a robust CDI include:
- Personalizing marketing campaigns - Your CDP, CRM, and customer journey orchestration tools all rely on the data you gather from first-party data sources to generate the customer profiles and insights that your team can use to make better marketing campaigns.
- Optimizing customer acquisition - The data your CDI gathers can also provide insights into how you can market to look-alike prospects, allowing you to grow a customer base by marketing to people similar to your existing customers.
- Improving customer retention - Robust data can give you insights into how satisfied your existing customers are. For example, if your CDI can collect data about user behavior during the onboarding process for a service, you can examine where people churn. This gives you insights into pain points the company can address, as well as who to target with marketing when those fixes are in place.
- Making data-driven marketing decisions - It’s more difficult to make a decision based on incomplete data. Investing in your CDI gives you good data to work from, so you can make better marketing decisions. For example, if you can only collect a customer’s name, age, and geographical area, there is a lot you won’t know about why they make buying decisions. Connecting other tools that can get more data, like social media habits or purchase history, allows you to target them more effectively.
How to implement customer data infrastructure at your organization
The data infrastructure for every organization will be different because all businesses function differently. When implementing customer data infrastructure, your organization will go through the following steps:
Data strategy
Before building an infrastructure, you need to know where it’s going. Begin by deciding goals for what your team needs or wants to accomplish with the data, such as integrating some specific martech solutions, or gathering additional data for customer profiles. If you begin with a lofty goal, prepare to ground it with a deeper understanding of what’s possible; if you begin with simpler goals, you may need to define the next logical step. Either way, you will want to talk to your data and infrastructure teams to understand what you can accomplish.
Data model
The data model defines how you will organize the data and metadata, such as all of the data your tools will collect for each customer, and how it will all be cataloged so your other tools can work with the data. You will need a data model that is common across all the tools in your CDI, and flexible enough should your needs change.
Your data and infrastructure teams will likely lead this process, as they may already be working with a data model and the information your current tools are collecting. However, they may need some additional resources to help meet goals or new tools to collect certain kinds of data.
Data storage
As you gather this data, you need somewhere to put it. There are three main options for storage solutions, a data lake, a data warehouse, or a hybrid of the two. Most marketing purposes will likely require a data warehouse or a hybrid approach. -
- Data lake - A large, unstructured pool of data. The data may or may not be complete. This is generally a less expensive option, used for deep analysis of a lot of data. You must clean up the data (see the data hygiene section below) as you retrieve and use it.
- Data warehouse - A structured series of databases and tables. The data is usually mostly complete. Data warehouses are generally more expensive, and used for business operations. Data is usually cleaned before it is stored.
The exact design of these storage options will depend on your goals, existing tools, and the data models you will use.
Data hygiene
The data you already have may not be complete or organized in the most useful way. You may have incomplete entries for some customers, outdated or duplicated entries, or even old records from a legacy system. Before you enter that data into storage, you will need to create a process for cleaning the data you have so that it fits into your data model.
This begins with an audit of the data you already have. Fortunately, performing this audit also helps with your data governance. While performing your audit, pay attention to the kinds of data you have, what edits you need to make, and how things are stored.
Extract, transform, load (ETL) pipeline
If you think of the previous steps as building shelves to store and catalog your data,, the ETL stage is where you fill and organize those shelves.
- Extract - Integrations between your marketing tools and your CDI solutions begin to gather the data about your customers.
- Transform - Tools and rules for converting and cataloging that data according to your data model.
- Load - The mechanisms that place your data into your storage solutions.
Your company’s technical teams will likely perform these tasks and then fine tune the ETL pipeline to ensure that data is ingested properly.
Data governance program
At this point, the data storage solutions should start to fill with robust, usable data. However, the work does not stop there. You need to establish data handling processes, policies, and procedures around the data you collect. Data governance is a set of internal policies around various topics, including:
- Data integrity - Is the data coming in still accurate, valid, timely, and complete?
- Data availability - Can the tools and teams that need that data still access it?
- Data usability - Can users get meaningful information from the data?
- Data quality - How complete is the data, and is it still usable?
- Data compliance - Are we following all of the laws we’re subject to, related to the data we have?
- Data security - Is the data safe from any potential malicious actors?
If the answer to any of these questions is “no,” make sure to have a plan to address the problems.
For example, if a company discovers a security vulnerability in its database software, they should have a policy in place for updating or replacing that software, notifying affected users, and public disclosure if necessary. In that instance, those policies may overlap with data compliance, as local laws may require certain actions if PII or other protected data is ever breached.
Which departments should be involved with customer data infrastructure?
In short, any team who creates and captures data should be part of the conversation about data infrastructure. This includes your data and analytics teams, website or web application teams, information technology (IT), product and infrastructure, customer service, finance, and marketing. Each team should come with details about the data that their customer interactions generate, and how other systems handle, store, and access that data.
When implementing a CDI, much of the work will likely fall on the technical teams. Standing up servers, installing solutions, and fine-tuning API integrations are tasks best handled by dedicated technical experts. However, customer-facing teams should still have strong goals for what they want the infrastructure to do, and the understanding that some features may have to come later in the project timeline.
What tools do I need to implement customer data infrastructure?
Because a customer data infrastructure is a framework of tools, and some of those will be specific to the services or industries your organization operates in, the tools you need to implement a CDI will vary. The types of tools largely fall into one of three groups:
Tools to gather and ingest data
Tools that collect the raw data your CDI uses. This includes:
- Third-party tools, such as those for social media (Instagram, TikTok, Meta, SnapChat), analytics, ad placement, and other integrations
- Customer data tools, such as your event tracking, web or mobile apps, and web servers
Tools to process and integrate data
The solutions that make up your ETL pipeline and storage solutions, including:
- Data storage solutions, such as your data lake or data warehouse
- Cloud storage solutions, such as Snowflake, Amazon Web Services, Microsoft Azure, or Google Cloud BigQuery
- Tools like Fivetran, which move data from a source into the data warehouse
- Identity resolution and data activation tools, such as your customer data platform (CDP)
Tools that use data
Any components that benefit from your CDI, which may include
- Customer relationship management platforms
- Customer journey orchestration platforms
- Internal dashboards
- Reporting platforms
- Artificial intelligence or machine learning tools
- Data governance tools
- Future data needs or projects
More from the University
Looking for guidance on your Data Warehouse?
Supercharge your favorite marketing and sales tools with intelligent customer audiences built in BigQuery, Snowflake, or Redshift.