The Best Metric to Measure Accuracy of Classification Models

Unlike evaluating the accuracy of models that predict a continuous or discrete dependent variable like Linear Regression models, evaluating the accuracy of a classification model could be more complex and time-consuming. Before measuring the accuracy of classification models, an analyst would first measure its robustness with the help of metrics such as AIC-BIC, AUC-ROC, AUC- PR, Kolmogorov-Smirnov chart, etc. The

How we increased our Demo Signups by 40%

Being a startup makes us continually iterate and improve design and copy on our website. Here is how we improved, tested and optimized one of the most important pages – our online product demo signup page. Once someone fills up the demo request, we schedule an online, personalized walk-through of the CleverTap dashboard with them. Our existing product demo

Sleepless nights with MongoDB WiredTiger and our return to MMAPv1

We have been using MongoDB 2.6 with MMAPv1 as the storage engine for the past two years. It’s been a stable component in our system until we upgraded to 3.0 and promoted secondaries configured with WiredTiger as the storage engine to primary. To put things in context, we do approximately ~18.07K operations/second on one primary

A Primer on Logistic Regression – Part I

In the real world, we often come across scenarios which requires to make decisions that result into finite outcomes, like the below examples, Will it rain today? Will I reach office on time today? Would a child graduate from his/her university? Does sedentary lifestyle increase the chances to get the heart disease? Does smoking lead

A Neat Trick to Increase Robustness of Regression Models

The first predictive model that an analyst encounters is Linear Regression. A linear regression line has an equation of the form, where X = explanatory variable, Y = dependent variable, a = intercept and b = coefficient. In order to find the intercept and coefficients of a linear regression line, the above equation is generally solved by

Tricks and Tips for Feature Engineering

Predictive modeling is a formula that transforms a list of input fields or variables into some output of interest. Feature engineering is simply a thoughtful creation of new input fields from existing input fields, either in an automated fashion or manually, with valuable inputs from domain expertise, logical reasoning, or intuition. The new input fields could

The Fallacy of Seeing Patterns

Human beings try to find patterns to explain the reason behind almost every phenomenon, but that doesn’t mean that there is a pattern to rely on. Superstitions are a classic example where spurious patterns were generalized to explain many a phenomena. As Analysts, we are on the lookout for patterns and quite often, either knowingly

I Wish I Had Autobots for Data Transformation

Being a sci-fi movie buff, I would always wonder if my variables could turn into Autobots just like the movie ‘Transformers’ and make my life building statistical models that much easier. Until that day, I will have to use the available tools to transform my variables. Data Analysis Before drawing valuable insights or building predictive

How to Compare Apples and Oranges ? : Part III

In the part 1 and part 2 of the series, we looked at ways to compare numerical variables and categorical variables. Let’s now look at techniques to compare mixed type of variables i.e. numerical and categorical variables together. Please read this article to visually analyze the relationship between mixed type of variables. We will work with

How to Compare Apples and Oranges ? : Part II

In the previous article, we looked at some of the ways to compare different numerical variables. In this article, we shall look at techniques to compare categorical variables with the help of an example. Assume you have been given a dataset totaling 10,000 rows containing user information on Operating System, Gender and whether the user

How to Compare Apples and Oranges? : Part I

How often have you come across the idiom “Comparing apples and oranges”. It is a great analogy to articulate that two things can’t be compared due to the fundamental difference between them. As an analyst, you deal with such difference and make sense of it on a daily basis. Let’s take an example and understand some ways to

Using Raspberry Pi to build a commercial grade wall information dashboard

We recently built wall-information display screens for our offices to show interesting metrics of our production environment along with some business numbers we care about. While looking up how people do this kind of stuff with a Raspberry Pi (Rpi), I came across most folks using a display manager, like LightDM, and user auto log-in at boot to trigger

Deriving Better Insights from Time Series Data with Cycle Plots

Visualizing time series data for the analysis of numerical information like revenue, app launches, uninstalls, etc. can help analysts quickly reveal an underlying trend. The graph below displays the visualization of time series data: The above graph captures the essence of a slight uptrend over the course of 12 weeks but leaves out further details

Best Practices for Internationalization on the Web

In today’s fast moving world, an intention of building everything that is accessible over the Internet has given rise to cut-throat competition. This often results in a website that lacks some useful and important aspect of Web standards. Here, I’m not talking about some optional fancy features or something that will add ‘wow factor’ to

How to Setup Multiple Broadcast Receivers for Handling Push Notifications on Android

Broadcast receivers are the handlers for receiving and processing push notifications sent to an android device. Setting up a single broadcast receiver to handle push notifications to your app is a straightforward process. But what if you need to handle push notifications from disparate sources; from multiple third-party push notification providers and/or from your own servers,

Developer’s Guide: How to Migrate from Parse with Zero Downtime

Do you have the Parse migration blues? The recent announcement of Facebook’s closure of Parse, has developers and marketers searching for a new mobile analytics and engagement platform. Not to worry, CleverTap has come up with an easy transition for you in our Parse Migration Guide. We’ve released a complete solution, including an updated mobile SDK, which

How to Remove Duplicates in Large Datasets

Dealing with large datasets is often daunting. With limited computing resources, particularly memory, it can be challenging to perform even basic tasks like counting distinct elements, membership check, filtering duplicate elements, finding minimum, maximum, top-n elements, or set operations like union, intersection, similarity and so on. Probabilistic data structures to the rescue Probabilistic data structures

How to Treat Missing Values in Your Data : Part II

In the previous article, we discussed some techniques to deal with missing data. We will now look at an example where we shall test all the techniques discussed earlier to infer or deal with such missing observations. With the information on Visits,Transactions, Operating System, and Gender, we need to build a model to predict Revenue.

Fixing Notification Icon for Android Lollipop and Above

Have you noticed a notification bar with an icon that is just a solid white square?  Or are you having trouble rendering your Logo correctly on the Android notification bar post Lollipop release? First let’s understand the Android documentation which is as follows  – “Update or remove assets that involve color. The system ignores all