Blog Data Science

Data Integrity: Why It’s Crucial to Understanding User Behavior

Siddharth Jain Siddharth Jain, a Data Scientist at CleverTap, has over 4 years of experience in data engineering and Python backend development. In his free time, he enjoys birdwatching and trekking.
Data Integrity: Why It’s Crucial to Understanding User Behavior

In today’s era of personalized marketing, a marketer is lost without rich customer data. But what many marketers forget while pushing millions of data points to the cloud is the importance of quality and the integrity of their data points. 
High-quality data makes life better for all teams. The marketing team can build better campaigns, the analytics team can make better decisions without transformations and workarounds, and the product team can make informed product decisions.
Here’s a simple example: you push millions of purchase events a month, but realize one day that you have mapped product name as a property to the event. This may seem logical or trivial, but imagine that you have multiple products with the exact same name. This will cause numbers to be inflated and mislead you and your strategic marketing pipelines. This could have been avoided if you mapped product ID or SKU to the purchase event. 
If you can’t trust your data, you will never feel comfortable using analytics to make business decisions.

Common Situations that Degrade Data Integrity

Data Duplication
Not only is duplication bad for data integrity, but it can also directly lead to increased costs due to storage capacities. This could be duplication of user profiles or even the same event being raised multiple times where it should have been raised once.
A common example of duplication: your integrated SDK tracks some “System Events” by default, such as App Launched, or Notification Clicked, but your developers overlook this and raise the event manually as well. In such a case, you’re storing the same event twice, driving up your platform costs, as you will exceed your event tracking quota faster. At a scale of millions of users, these extra expenses will really hurt your marketing and analytics budgets.
Time Zone Management 
A large chunk of marketing campaigns are time-based, and a wrongly specified user timezone can mean your campaign is going unnoticed because it’s delivered at 3 a.m. instead of 7 p.m. This is especially important if you have an international business.
Location Data
There are two types of locations that you can get from your app:

1. Coarse location: 

This can be extracted using the mobile network provider and wifi access points without GPS access but doesn’t have high accuracy.
This could be good enough for your use cases, such as if you just want to send campaigns or segment users based on rough location, like users from California.
Coarse location data can also be enough for you to derive valuable and essential insights such as market penetration, customer adoption, and more.
For these insights, you probably do not need to collect the more expensive precise user location.

2. Precise location: 

This uses GPS access and is accurate to within a few meters.
High accuracy location data is of greater importance across industries like ecommerce, food and grocery delivery, etc. User location is important for special deals, location-based personalization, recommendations, and so on.
As an example, many of CleverTap’s clients use geofencing to serve highly accurate location-based engagements to users, such as sending a coupon when they are near one of the client’s outlets. As you can imagine, this requires location data to be precise within a few meters.
To ensure high accuracy, GPS access from the end user is required. Keep in mind, if the end user chooses to decline permission to access location data due to privacy and/or due to battery concerns, these engagements are bound to suffer.
You can choose the accuracy of location data you need by understanding the importance of location data in your specific marketing use cases. The costs of retrieving accurate location data may outweigh the benefits in many cases, since you may lose out on getting data from users that choose to opt out of giving location data, and you might need a modified journey for those users. 
But keep in mind that if you choose to stick to coarse location data, all stakeholders should be made aware that the location data may be fuzzy and should not be used where a precise location is required.
Poor Event Naming
The way you name application events can have a long-lasting impact on the clarity of your data. You may name an event where a product is viewed as:

  1. productView
  2. ProductViewed
  3. Product Viewed
  4. UserProductView
  5. Product_Viewed
  6. product-viewed
  7. ViewProduct

We recommend the “Noun + Verb” syntax. E.g., Product Viewed, Registration Completed.
Whatever you choose, make sure that your naming convention is maintained across all platforms and get your teams on the same page. We have observed cases where the same event is named or spelled differently on iOS and Android apps, and this leads to either data loss or misinformation, and can be messy to fix.
Be consistent and clear in naming to ensure a straightforward workflow. Make sure to consult relevant internal teams such as marketing and data science teams for smooth interoperability and low overhead. 
Check out these sample events by industry vertical for a good starting point.
Unique User Identification Issues
Identity management is a very important part of your marketing efforts. You need to uniquely identify users and avoid sending the same campaigns to the same user multiple times, and other such mistakes. 
Here are some important recommendations on setting user identities in your marketing platform:

  1. Do not set Identity if there isn’t one. For example, setting an identity as ‘None’ to multiple users will probably group all users under that ‘None’ Identity together (i.e. any user with the ‘None’ identity is assumed to be the same user.) Few platforms assign an internal unique ID to users in case the Identity passed is Null, like CleverTap does, and this identity can still be used to identify individual users.
  2. Do not assign an Identity that might change. As an example, if someone’s email can change within your app, then it is not a good idea to allow Email as an Identity. Most marketing platforms will mark the person as a new user if they change their email. The same can apply to a mobile number, Facebook ID, and so on.

You may have special use cases like multiple users using the app from one device. Sophisticated platforms like CleverTap can switch between different user profiles based on who is currently logged in.
Event Data
Ensure that your event data is planned and implemented carefully, both from a developer as well as a business perspective.
Think about your events and event properties methodically. You don’t want to raise a Purchase event without having the Product ID as an event property.
Identify trigger points for each event and make sure that the event is raised at the correct time. 
And don’t raise events at different points in a user journey in your Android app compared to your iOS app. For example, you may accidentally be raising a “Home Screen Viewed” event on Android when the home screen starts loading, but on iOS after the page is loaded. This can cause a clear discrepancy between Android and iOS numbers and lead to incorrect insights, as many users may close the app while the home screen is still loading, leading to relatively lower counts for iOS users, which in turn may cause you to make unnecessary changes.
Use a test account and a test user to corroborate actual app actions with the events raised on the User Profile page.
Datatype of Event Data
Identifying the right data types for your event properties should be a well thought out decision. Few marketing platforms like CleverTap have a Schema management tool built in that can help you define the data types for each event property.
We have observed cases where user birthdays are being passed from the app as a string instead of a Date object. This renders the data mostly useless because simple queries like “What % of my users belong to Gen-Z?” cannot be answered quickly.
Recently, we had a client sending their transaction amount data as a string instead of an integer. This makes a lot of simple mathematical operations impossible to execute.
Simple oversights can lead to a significant waste of resources: trying to transform the data after the fact or having to discard it completely.

How Can I Maintain High Data Integrity?

Luckily for you, it’s not very difficult to maintain data integrity. But it does take some planning and forethought. 
Follow these best practices:

  1. Consistent and clear event naming, across your documentation and across your platforms.
  2. Spend extra time during the initial integration with your marketing platform to thoroughly manually test the events being raised across platforms. Use a test account and a user profile’s activity stream before going live. This will save you a lot of time and pain in the future.
  3. Thorough auditing should be conducted frequently, especially when some event structures are modified or added.
  4. Think carefully about what events you want to track, don’t go overboard, and don’t miss critical events. Make sure to clearly define your KPIs and use cases.
  5. Consult relevant teams that may have use for the data, such as the marketing team, management, developers, and data scientists.
  6. Make sure that all platform teams (Android, iOS, SDK) are on the same page when deciding when to raise an event.
  7. Put time into maintaining an event schema. Visit our documentation to learn more about Schema and how it can help maintain data integrity.

Achieving and Maintaining Data Integrity

It is very important to be able to trust your data if you plan to base your business decisions on it, and all it needs is a little bit of extra time and effort to develop this trust. Maintaining data integrity will save you and your company a lot of pain down the line.

Measuring the Real Impact of Marketing Pocket Guide

Learn how marketers can track the metrics that truly matter and align marketing activities to business goals.

Download Ebook Now

Last updated on March 26, 2024