With customer data spread across multiple silos and complex marketing technology stacks, it can be very difficult to get a complete picture of who your customers are, what they do, and how they feel. By combining data from multiple sources, a customer data platform (CDP) can bridge this customer data gap and give you a more holistic view of the customer. But before investing in a new technology solution it’s essential to have a solid plan for customer data management.
There are three common data management foibles that marketers need to consider when preparing and maintaining a customer database. (A quick caveat: If you’re a database administrator or otherwise technically oriented, you may want to flip tables over how I generalize and, in places, align concepts in ways that don’t make “technical” sense. I get it, but this post is geared toward business folks.)
Foible #1: matching users across systems, stacks & silos
The core function of a CDP is simply to find customers in your data and match them across sources. And some platforms, like Lytics, tout user matching as one of their “secret sauce” differentiators. Whether you already have an off-the-shelf customer data platform, or simply want to get a better handle on your customer data management, here are three key considerations for matching users:
Synchronized unique identifiers
The most common way to match and track users consistently over time and across platforms is by using a unique identifier. This is simply a number or other string. For example, an email address, which by its nature is unique, works well when a person has one email address associated with them across multiple touchpoints or platforms. But an email address is not always a good unique identifier. Not everyone limits their identity to a single email address. And with some systems you don’t necessarily know the user’s email address.
So, while using an email address may help match users, it does not replace good master data management (MDM).
Master data management
Customer data management is essentially an offshoot of MDM. The big idea behind MDM is that any connected data system should have a set of rules about how different data sources and systems relate to each other. Where things get weird is that every customer platform generates unique ID numbers for customers, which act as keys to prevent data attributes from getting misappropriated internally to another function. That means you then have multiple unique keys coming from multiple platforms, and any one of them could be used as a master ID for linking customer data.
One way to handle this is to establish a “source of truth” or SoT for customer records. This is often referred to as a database of record (DBoR) as well. Often the SoT/DBoR is a custom database that contains an account number. These types of numbers are also unique by nature, and thus a great unique identifier. In a pure marketing stack, this is typically a CRM, because they are omnipresent and often contain the most known information about customers. When linking data platforms, the DBoR should always be syncing its ID with other platforms. Then, when data gets pulled back from other platforms, you can link records back to specific customers.
Cookie matching and behavior mapping
Another method for matching customers and data involves matching cookies for users across multiple devices and browsers. Google is obviously very good at this, since it is a cornerstone of their Doubleclick targeting stack. They have a wonderfully detailed write-up about how it works, but the basic idea is that once you have a cookie on someone’s browser for an independent network, you can tie their network user ID together with other platform IDs. Most commercial CDPs have this capability to some capacity, where they license it from a data management provider or make their own.
Another, similar method is simply matching a user’s behavior across click-tracking platforms. Say, for example, you are tracking clicks using both an analytics tool like Adobe Analytics and a marketing automation platform like Marketo. Both are tracking clicks and pageviews, but in different ways. Wise old digital analysts will always tell you that two platforms will give you slightly different analytics reports. But on a user-by-user basis, the recorded use patterns are close enough that you can likely match their patterns using some predictive modeling.
Foible #2: scheming the schema
When assembling a connected data system, you should have a strategic vision for how it will technically work to answer business questions. While the intent here is not to give you all you need to run out and build a new database, there are some core technical considerations you should be aware of.
Are you normalizing your data?
Being normal is boring, but normalizing your data... well that’s boring too, but absolutely required. That’s because as you start combining user records, there’s a very good chance of dirtying up your data. Again, what you’re attempting to do is have one source of truth for a customer or user. This means having as little redundancy in the database of record as possible. An essential part of this process is overcoming the garbage in/garbage out paradigm. If you import “dirty” data, you will have dirty data in your system, and then you will have misleading output.
Are you capturing the right fields?
First, you need to consider whether or not you are actually collecting the right customer data. Again this originates with your overall strategy. Do you have a set of documented requirements or goals? How do they align with overall business strategy? It’s tempting to jump right in and start developing a new platform and importing data to play with, but we always recommend spending time to develop a business Key Performance Indicator (KPI) framework that establishes not only what success looks like, but how specifically how it’s going to be measured.
This includes the metrics that comprise KPIs and where data will come from to generate those metrics. This can then be directly translated into a data dictionary that acts as a set of requirements that database developers use to make the magic happen.
Is there room for growth?
It is also very tempting to consider only current business requirements. As business leaders, we are responsible for meeting a series of goals that test to be short term. This means approaching development efforts from the perspective of what we need now and ignoring future considerations. What happens if you’re way more successful than you planned for, and the demands on your platform outgrow its technical specifications?
Foible #3: ongoing data integrity
Assuming everything is clean, there is no garbage to worry about, and you are scaled for the future, everything is great forever, right? Wrong, of course. Incoming data naturally changes over time—half the battle is ongoing maintenance.
Girish Pancha has a wonderful post over on CMSWire that covers a topic he coined as data drift. This can be summarized as the lessening of data integrity caused by changes to foundational elements of a data system. Or, as Girish puts it, “The operation, maintenance and modernization of these systems causes unpredictable, unannounced and unending mutations of data characteristics.” (I really love that sentence.)
Drift occurs when there are changes in the structure (database schema), semantics (what the data means) or infrastructure (technology and platforms). It all comes down to one problem: the way data is stored and processed is changing, and thus inconsistent. This means that any calculations or models that rely on your current data structure will likely break eventually.
One very common problem relating to drift is something I call API jitter. This is not a little dance APIs are doing when you’re not looking. Rather this describes a series of changes platform developers implement in how their APIs function that can result in infrastructural drift. This is way more common that many think. Often it’s a series of little changes meant to fix little problems, hence jitter. But add up a set of little changes and you get big problems. Many publishers are kind enough to version APIs, but even within major releases, there can be jitter.
I had an issue like this come up with the YouTube API several years ago. Google kept tweaking the API slightly over time, and each time they made a small change it broke how our system was designed to import data. We didn't know it was an issue until data stopped being populated. That meant checking the data manually every morning! We eventually worked around the issues, but not before some embarrassing conversations.
As organizations start to engage more deeply with data of all types, these and many other foibles will come up. Fortunately, all of these customer data management challenges can be overcome with proper data governance. Having a well-designed process and plan to deal with them is the best way to handle them as they arise. And, as with any other form of governance, there has to be support at the executive level.
If all this seems a little daunting, don’t despair. Start with your business strategy and identify the goals you have for your customer data. With requirements in hand you can reach out to a data professional and get help building an effective customer data management plan that will keep you foible-free. Have questions or comments? Please share them below!
Get useful insights, tools and best practices to delight customers and grow your business.
Keep up to date with our latest perspective, news and events.
You’re all set.
Look for our Connective Thinking newsletter on every full moon, as well as occasional news and other updates. In the meantime, check out our work or connect with us on social.
- © 2018 Connective DX
- All Rights Reserved