Using Color to Group and Label in Charts

Jorge wrote in a recent post about loss aversion, the fact that “people strongly prefer avoiding losses than acquiring gains”.

Loss aversion […]: Translated to chart-making, it means that there is a “tendency to avoid losing data at any cost”. The chart below shows you the Money Income Of Households as published.

image

Take the above chart, for instance: does it make any sense to add those nine series to a single chart?

Remove irrelevant data series and you risk a mutiny on the Bounty, even if relevant trends are easier to detect. It is absurd, but very human.

So, how can you give the users all the data they expect while keeping the chart clean and readable?

Clearly, having 9 data series in a line chart where each line is labeled with a district color is is problematic. You hardly can make any sense out of this chart without going back and forth for hours between the lines and the legend. In order to fix this in Excel you could create an interactive chart or a small multiples version of the chart with small charts.

However, I felt a bit challenged by Jorge’s post and thought why not try fixing the chart by using color. Maureene Stone designed for our upcoming product Chart Tamer a wonderful set of colors. Chart Tamer is an Excel add-in we developed with Stephen Few to help business users create better charts with Excel. One part of the Chart Tamer story is to get the colors in Excel charts right.

Here are some of the colors Maureen designed for Chart Tamer:

image

Maureen designed the colors, so that they have maximal hue separation, and so that the tints within ramps are distinct enough to serve as series color, which can be quite challenging when you ever tried to build your own color ramps in Excel.

I assigned the high group 3 colors of the blue ramp, the middle 4 colors of the gray ramp and the low income group 3 colors of the Orange ramp to establish a grouping based on hue.

Income of Households, Percent Distribution by Income Level in K$, 1967—2005

The colors now nicely group the Income groups, make the chart easy to read and it immediately tells you the story: You can see that the upper income levels (in Blue) have gained a lot of income in the last 40 years whereas the lower income group (in Orange) and particular the middle income levels (in Gray) lost ground.

Update: Jon pointed out the downside of the approach of grouping with 3 colors

it is still not an easy thing to see how the upper series perform. At first I think the gray must belong to the opposite end from the orange, but then I notice the blue. So I rely on the legend, and see, three colors, dark then medium then light. Then I notice the orange is in the opposite order.

I think this isn’t easy to do with three colors. If you could move from dark blue through light blue, then to light orange through dark orange, it would be very effective. Three colors spoils this progression.

Lets try to realize Jon’s suggestion and encode the income levels as a a diverging color pallet.

We use the Chart Tamer colors above

  • Starting with dark Blue the highest income level
  • Ending with a dark Orange for the lowest income level
  • Placing a light gray in the middle (needed as we have 9 income groups)

 

 

Income of Households, Percent Distribution by Income Level in K$, 1967—2005

You can still can see clearly that the upper income levels (the Blues) have gained a lot of income in the last 40 years whereas the lower income groups (in Orange) lost ground. The distinction between High, Mid and Low income levels, however, seems to be coming through clearer with the three color encoding.

22 thoughts on “Using Color to Group and Label in Charts”

  1. Nice try, but it is still not an easy thing to see how the upper series perform. At first I think the gray must belong to the opposite end from the orange, but then I notice the blue. So I rely on the legend, and see, three colors, dark then medium then light. Then I notice the orange is in the opposite order.

    I think this isn’t easy to do with three colors. If you could move from dark blue through light blue, then to light orange through dark orange, it would be very effective. Three colors spoils this progression.

  2. If the Y-axis is percentage of households, then it seems that the darkest blue does not indicate more wealth for the wealthy but rather more households having a 100k or higher income level. So if the lower income levels were trending upward such as the bottom line, then that would be a more concerning thing in general. Also, if we are to accurately understand the data we need to see to which group people moved over time. Did the 35 to 49.9 move up or did they move down, since the share of households in that bracket is declining over time?

    In any case, I am eagerly awaiting the chart tamer add-in.

  3. Hi, I found your blog on this new directory of WordPress Blogs at blackhatbootcamp.com/listofwordpressblogs. I dont know how your blog came up, must have been a typo, i duno. Anyways, I just clicked it and here I am. Your blog looks good. Have a nice day. James.

  4. Michael, you’re misrepresenting the graph. The series are percentages of households, and the Y axis is percentage of total income. The result is that the top 0.1% of households had 4% of the total income in 1967, and 18% of the total income in 2005.

    Meanwhile, the upper 25% of households also gained share somewhat, and the two income levels representing 35-75% (40% of the households in total) lost share of the total income. (although there appears to be a mistake in the graph, and they have the same shade, or indistinguishable shades, of blue)

    Admittedly, this is a problem of the graph design, as it is not clear that the legend does not describe income in dollars, and the Y axis does not describe precentage of households. Also, “100 over” makes no sense: that should read 99.9-100%.

    Everywhere in the graph, they’re using percentages while omitting the “percent” sign, which is never right to me. If they were using dollars it would be equally wrong to omit the “dollar” sign. It just causes confusion all over about what the numbers mean. There is a lot of bad design there that the xlcubed team haven’t fixed, while they were changing colors.

  5. For all the criticism about stacking, sometimes stacked is clearer than unstacked, not less clear. If this were a stacked area graph, there would be infinitely less confusion than this mishmash of colored lines.

  6. whereas the lower income group (in Orange) and particular the middle income levels (in Gray) lost ground.

    I don’t agree that it is the middle in particular; I think that’s probably an optical illusion caused by the middle having so much more to lose than the lower. Nicholas Bissantz and Jon Peltier recently discussed this notorious effect of linear Y axis scales:

    http://blog.bissantz.com/shocking-time-series

    http://peltiertech.com/WordPress/2008/09/16/logarithmic-axis-scales/

  7. I stand corrected then, but plead that it really makes my point about how leaving off the dollar signs from dollar values and the percent signs off percent values risks graph viewers getting the opposite impression from a graph than the graph maker intended!

    I’ll have to take a look at the data when I get time. Did you copy the table by hand out of the US Census Bureau PDF cited in Jorge’s graph, or get it from some more convenient source?

  8. Okay, but Andreas, Jorge, and the original graph maker either were all very diligent, or one was diligent and shared with the others.

    I’d really like to find a cheap PDF table-to-delimited-text converter that works. I asked the makers of Foxit Reader (small fast free PDF reader) about it, and they said “we have no plans.”

  9. Thanks for that. I’d like to think I’d have found it myself eventually, but all I could see before leaving for work this morning was the report.

  10. Another approach could have been to use shades of color to indicate the growth as well as the levels… for example use a very light shade to represent the the low income and then a more bolder shade to represent the higher income… you could also use gradients to represent growth.. just an idea.. nice blog.

  11. Pingback: Rex Ryan

Leave a Reply

Your email address will not be published. Required fields are marked *