Data Science

How to Represent Data with Intelligent Use of the Coordinate System

The most widely used coordinate system to represent data is the Cartesian coordinates followed by Polar coordinates.

Coordinate-System

Source: Wikipedia

Basically, Cartesian coordinate system uses a grid of straight lines while Polar coordinate system uses a grid of circles to represent data.

Let’s now look at a few examples where with the appropriate use of the systems discussed above, we will able to visualize different aspects of the data.

The standard graphic visualization for a categorical variable is bar charts. Bar charts are useful if you are dealing with a low number of categories within a variable.
Bar Chart for App Launches by Operating System
The bar chart, above represents the App Launches by the Operating system.

Suppose, we need to check the App launches by various countries. In this case, there may be many categories as shown below:
Bar Chart for App Launches by Countries
The bar chart above makes inefficient use of space since there are many countries. In such a case, you could restrict the bars to the top 5 countries by App Launches, but you’ll end up sacrificing the information about other countries which could help you see a much bigger picture.

What if we flip the coordinate system to Polar from Cartesian?
Bar Chart for App Launches by Countries shown in Polar coordinate system
The above plot looks better than the bar chart on cartesian coordinate system. You can instantly draw the conclusion about the top countries by App Launches.

Well, things are getting better, but still we can’t make out the proportion accounted by the top contributors unless we also indicate the percentages along with the counts.

Now, lets look at another class of plots known as Tree Maps, which is plotted on the Cartesian coordinate system. A Tree map is shown in the form of a rectangle with each country shown as nested rectangles inside it. The area occupied by each rectangle is proportional to the count of the App Launches in each country to the total App Launches as shown below:
Tree Map for App Launches by Country
United States has the highest share of the App Launches since the area occupied by its rectangle is the highest. It can be inferred from the above example, that roughly 60% of the App Launches are accounted by the top 5 countries and they occupy roughly 60% area of the tree map. The rest of the countries account for the other half. You can further extend the tree map to include percentage of App Launches as with bar charts shown above. Tree maps are also useful when you have to show nested hierarchical information like OS version within OS as compared to a stacked bar chart or side-by-side bar chart. Unlike bar charts, tree maps make efficient use of the space provided.

Let’s now look at another example. An entertainment company would like to measure the interaction on its app by calculating the video played by its users. It decides to check the counts for video played on each day of the week and each hour in a day. The general practice is to look at a heat map, which is plotted on a Cartesian coordinate system as below:
Regular rectangular heat map for day of week and time of day
We have the Hour of the Day plotted on the x-axis and Day of the Week plotted on the y-axis. Darker shades indicate higher counts. It can be seen from the above heat map that weekdays between 8 pm to 11 pm is the most active period for video played. This could be the prime time for the app. The most active time period for Sunday is 7pm to 8pm followed by 11am to 7 pm, which is clearly a different behavior, compared to other days.

What if we show the same heat map on a polar coordinate system?
Circular heat map for day of the week and hour of the day for video viewed
The region between each of the concentric circles represents the day of the week starting with ‘Sunday’. Surprisingly, heat map on the polar coordinate system is much easier on the eyes compared to the heat map shown on cartesian coordinate system.

It may thus be concluded that if one uses the coordinate system intelligently, the data represented could be much easier to read and the information embedded may be conveyed faster.

Source Code and Dataset to reproduce the above article available here

,

4000

55 billion

1 billion

10 billion