terça-feira, 12 de janeiro de 2016

Why are some charts difficult to understand? Part II

A first answer from Andrej Lapajne from ZebraBI (https://zebra.bi/ressources)

Charts should provide a clear and practically immediate insight into the underlying dataset. Especially if we're talking about one single chart, created from a tiny, almost trivial dataset. If we cannot achieve that, how will we tackle the complexity of our business and social environment in the age of big data?
Let's take the following example of media usage from the Comparable Metrics Report Q2 2015 by Nielsen:

Do you find this visualization a bit confusing? Of course with some effort we can decode the meaning of these nicely colored rectangles and at least roughly make a few visual comparisons. But hey, why the stacked column chart? Do I really have to move my eyes back and forth between the chart and it's legend just to understand which colors represent a certain data category?
After all, there are altogether 12 data points, isn't there a better way to visualize them?

Why is this chart confusing?

Let's start with the color legend at the bottom of the chart.
While using color legends in charts is a common practice, especially due to default chart behaviour in tools like MS Excel, this is indeed a bad practice. Figuring out which data categories are visualized by matching colors in a separate color legend is a very "expensive" (time-consuming) operation.
Just think about it: you have to take a look at the chart, remember the first color, move your eyes down to the legend and start scanning it until you find the matching color. Then read the adjacent label, remember it. Next, move your eye back to the chart and re-check the values with this label in your mind. Move your eye to the next category in the chart, remember the color, move your eyes down to the legend and start scanning it until you find the matching color... And so on for all data categories, probably even several times.
And what happens if you print it and you've only got a black&white printer?
So what should you do?
Get rid of color legends altogether! Place the series names to their most natural position on the chart instead. In stacked charts you can simply move your legend to the right and then align each label exactly to the center of the corresponding data point, like this:

Please excuse my paint art here, this is just a quick sketch.
OK, now we can move on to real problem of this visualization: the chart used is simply not appropriate for the message that the author tried to communicate.
Stacked charts are suitable for part-to-whole comparisons (comparing data series to the total value) which is not the case here. This chart is forcing us to compare for example the share of TV-connected devices to the sum of all devices with positive growth or the share of TV to sum of devices with negative growth. That doesn't make any sense, hence the confusion...
In general, try to avoid stacked charts altogether. In most cases you'll find a better replacement. Stacked charts only work well with a very limited number of series,  e.g. comparing 1, 2 or 3 data series (parts) to the total value (whole).
Without further ado, here's our redesign:

Instead of a stacked column chart we decided for two bar charts. The color legend is gone. All labels are aligned with the data points on the chart and displayed horizontally for maximum legibility. Instead of using random colors for each data category, we use only two colors: green for values with positive growth and red for values with negative growth.
These charts are called variance charts.
Beware, both charts are scaled (rendered on exactly the same Y-axis scale). That's extremely important, because only properly scaled charts allow visual comparisons. In the above case we can thus observe the following facts:
  • Young people (ages 18-34) have moved to smartphones slightly more than people aged between 35-49,
  • Older segment (ages 35-49) has moved to tablets 2x as much as the younger people,
  • Younger people have moved significantly more to TV-connected devices like DVD, game consoles and multimedia devices.
BTW, we have also corrected the sort order of media devices.
In case you're wondering: we've created this visualization in MS Excel in just a few clicks with the Zebra BI Add-inthat you can try for free here. The solution follows recommendations of the IBCS standard (however we've slightly simplified it to ensure it's suitable for the intended target audience: the general public).

Can I always select the right chart?

The selection of charts depends on many factors, such as the target audience, your intended message and the underlying dataset. That's why it's not possible to completely automate this task, but we've developed a 3-step process to make things easier for you. Let's illustrate it on the above example.
First of all, the data categories in the above case (smartphone, tablet, tv, radio...) are not time-series data, but ratherunordered categories (structure). Whenever you have structures like this, e.g. products, markets, customers, business units, account managers, cost centers, etc., use charts with vertical X-axis. In other words: turn the chart clockwise for 90 degrees.
Second, what's your analysis? The above case is all about comparing the differences. To visualize differences, deviations or variances we use so called "plus-minus" or "variance" charts. These are simply two-colored bar charts, for example red-green or its colorblind-safe replacement (black-orange or Tableau Software's favorite orange-blue or IBCS's recommendation red-blue).
Then there's also a third step: which chart shape should you use? We used bars, but theoretically it could also be thinner bars, dots, dots with droplines (the "lollipop" chart), etc. This step mainly depends on the type of KPI being presented, but also on other perceptional criteria, such as the density of data points.
Learn all about selecting the right chart in our in-depth whitepaper How to Choose the Right Chart - A 3-step Tutorial (free PDF).