Olympics Treemap

The 2012 London Olympics have now finished, and as a UK company we were pleased to see the games were such a success, and of course that team GB did so spectacularly well! We’re looking forward now to the Paralympics in a couple of weeks, and once the dust has settled there we’ll be shipping a new point release of XLCubed in September.

We’ll keep most of the changes under wraps for now, but one item which we are introducing is treemaps. The Olympic medal table gives us a nice opportunity to better understand the medal breakdown through the  new chart type. In XLCubed, treemaps can be produced directly from a cube or from a table held in Excel, as is the case here. The first example below shows the medals split by country and sport. The size of the rectangle depicts the total number of medals, and the colour shows the number of gold medals, the darker the colour the more gold. The numeric values list the total number of medals, then the number of golds. We can see the USA at the top, and that over half their medals came from swimming and athletics, with a bigger percentage of golds in the pool.

Any of the countries can be drilled into for a large view on their medal breakdown, not that we’re partisan of course… , but the view below is for Great Britain (GBR) where the particularly good showing by the cycling team stands out.

Taking a look at the same data split first by sport and then country, it’s easy to see the countries dominating the medals in each sport, and to delve into more detail by sport where required.


 Drilling into Athletics we can see that USA won most medals, and also most gold. Great Britain had just the 6 medals, but 4 were gold and hence the darker colour on their tile.

We’ll be making an interactive version of this available over the next few days.

 

 

Drive Excel Chart Min/Max from Range

How do I drive the min and max values of an axis from an Excel Range? This is one of the most commonly asked questions about Excel and with each new release it always amazes me that this feature hasn’t been added to the base product.

It’s a very common scenario to come across, you are building a line chart and it’s all looking ok until Excel suddenly decides to set the min value to 0, all of the detail is lost and you have gone from a nice detailed set of lines to a mishmash of colours a few pixels high.

There are some pretty sophisticated techniques Excel is using when working out what min & max to use, but sometimes we just want to set them to a particular value (normally anything other than 0!).

Here’s a pretty simple set of numbers and the resulting chart we get from Excel (just with all the defaults).

 

 

 

 

 

 

 

 

 

 

This all looks fine, but let’s change  “C” Monday’s value to 86, now look what happens:

 

 

 

 

 

 

 

 

Excel has applied its rules and decided that 0 is a good place to start the chart from, but in this case I lose a lot of the detail and end up with all the lines grouped together.

We could, of course, change the Axis min value to something a bit more sensible, so we’ll use the Format Axis option to set a minimum value of 84:

 

 

 

 

 

 

 

 

 

 

 

 

That looks better!

 

 

 

 

 

 

 

The base numbers had been entered manually, so being able to type a fixed value into the minimum axis is fine, but what if the numbers were coming from a cube or Sql database? Wouldn’t it be really helpful to be able to drive the minimum value from a range; I can change just about every other thing about the chart but after so many years and so many different version I still can’t do this.

Luckily for me (and our customers!) we already have an Excel addin so we can simply add the functionality to do this using one of the new formulae in 6.5:

XL3SetProperty( ObjectType, ObjectName, Property, Arg1, [Arg2],…, [Arg27] )

The formula to drive the chart axis from a range is simply:

=XL3SetProperty("Chart","Chart 1","YMin",$C$1)

Other options are:

PropertyDescriptionValue
“YMin” or “YMax”Sets the limits of the Y Axis.Numeric
“Y2Min” or “Y2Max”Sets the limits of the Y2 Axis.Numeric
“XMin” or “XMax”Sets the limits of the X Axis.Numeric
“X2Min” or “X2Max”Sets the limits of the X2 Axis.Numeric

Now finally we can build reports (and publish them to the Web), confident that regardless of the data or criteria selected  we aren’t going to end up with a line chart starting at 0 and bunching all the lines together.

This formula can also be used to modify various aspects of our own grids, slicers & small multiples based on the values of excel cells. The kind of things that we and our customers wanted to achieve were things like:

  • Move  dimensions between axes
  • Change the member selection types
  • Modify various grid properties based on different formulae

Lets look how the formula works to do some of these things:

=XL3SetProperty("Grid","My Grid","HierarchiesOnColumns","[Products]","[Regions]", $a$1)

Would move the Product, Region and whichever hierarchy is in $a$1 to the columns (I could use a slicer or drop down to update $a$1 to let the user switch between various hierarchies)

=XL3SetProperty("Grid","My Grid","RemoveEmptyRows",$b$1)

Would toggle whether to display rows without data based on the value of $b$1

If there are any aspects of Excel that you think would be useful to drive from a range, please let us know!

Small Multiples on River Quality

The phrase small multiple was popularised by Edward Tufte, and has become a generic term for a visual display using the same chart or graphic to display different slices of a data set. Their close positioning and shared scale make comparisons very easy and shared trends or outliers can be quickly spotted. Various other terms are also used to describe this charting approach, or specific aspects of it, including Trellis Charts, Lattice Charts, Grid Charts and Panel Charts.

The most common use case for small multiples is separate line charts to compare trend across a large number of varying elements. Placing them all within one chart would cause either a ‘spaghetti chart’ , or lots of occlusion as shown in the comparison below. Here we use a standard Excel line chart, and an XLCubed small multiple to chart the same data. Separating the charts while keeping a consistent axis scale makes for a much easier comparison than in the single chart.

We took a slightly different approach when using small multiples to take a look at differences in river water quality across regions of the UK. Our source data was not absolute numeric values, but 14 years of results categorised into four bandings (bad, poor, fair and good). We wanted to provide a ‘one-pager’ which gave a feel for the trend within each region, but also access to the annual breakdown of the different water qualities.

In the end we settled on a Small Multiple display of 100% stacked columns as shown below.

A percentage base seemed a sensible way to approach the data, as different regions will have differing numbers of rivers and of samples taken. Using this approach we’re able to see a comparison of the relative water quality rather than dealing in absolutes.

The user selects a geographic area of the country to view the regional breakdown within the selected area. The water quality for a particular year can be analysed by locating the region, and the specific year to see the percentage breakdown for each of the four categories.

The colouring of the 4 categories was chosen to aid ‘at a glance’ recognition of the overall water quality by region, and also of the trend. Dark blue signifies bad quality water (opaque), and light blue signifies good quality (think ‘you can see right through it….’).

So to read the display overall, or for trend:
• Dark colour signifies water quality problems.
• Light colour signifies good quality water.
• Reading left to right, increasing colour saturation shows declining quality over time.
• Reading left to right, decreasing colour saturation shows improving quality over time.
• Any region can be zoomed in on to see a larger chart and understand the breakdown in more detail.

Fairly quickly, and from just this one display we can draw a number of conclusions as below:
• Across the region, as a broad brush summary, water quality has improved since 1992.
• Doncaster has shown strong and steady improvement.
• Kingston upon Hull has the worst quality overall in the region, and varies significantly year on year.
• If you’re off for a swim in a Yorkshire river, Richmondshire looks a good bet!

We’ve designed a pre-set view in this case to work for the data in question, but the small multiple concept is also very powerful when interactively exploring data. A picture can tell a thousand words as they say – take a look at our youtube videos on small multiples: Video1 Video2

 

Something on the Horizon

We had an interesting scenario while helping a customer extend an existing Excel dashboard.

We had recently performed some work to solve some performance and design issues they had with their existing Analysis Services cubes. They now had more of their underlying data available and the ability to query longer periods without the performance hit (a year’s worth of data vs 28-days).

They wanted to make the most of this by charting changes in daily sales data over the previous 12 months, broken down by their four main business groups. Ideally the chart would become part of the existing Management Report, the difficulty was the lack of report real estate to add the extra information. This is something we have all come across previously and of course typically solved by using In-Cell charts.

Plotting the data on an Excel chart in the space available would give us this:

 

 

Converting to Sparklines gave us a slightly better view, but given the number of data items being plotted still not ideal.

 

 

Luckily our customer had recently upgraded to V6.1 of XLCubed so we were able to use one of our newest incell chart types: SparkHorizons. There is a good explanation of Horizon charts as part of the research paper: Sizing the Horizon: The Effects of Chart Size and Layering on the Graphical Perception of Time Series Visualizations and Stephen Few has covered them previously.

Essentially a line chart is split into colored bands – degrees of blue for positive numbers and degrees of red for negative numbers. In XLCubed this is 3 bands of each colour. The separation of the vertical scale means that horizon charts can be a lot more effective than standard sparklines where the scale of the numbers vary significantly, but you still want to retain a common scale view.

In this case plotting the same data as horizon charts makes things a lot clearer:

It now becomes quite clear when sales a trending up vs down. It’s also possible to flip the negative values so they appear on the same direction as the positive values:

 

We are always looking at ways of developing and extending XLCubed, SparkHorizons were added because they looked like they had the potential to be useful where the data suited them, so it was pleasing to be able to use them in a real-world situation.

It’s also worth mentioning that although, in this case the data came from Analysis Services Cubes, because they are available as Excel formula they can be used to plot any Excel data, here’s an example of the formula:

=XL3SparkHorizon(Sheet1!$V$2:$V$262,Sheet1!E10)

This will plot the data from Sheet1!$V$2:$V$262 as a SparkHorizon graph in Sheet1!E10.

 


Heatmap Tables with Excel – Revisited

We’ve revisited one of our more popular guides Heatmap Tables with Excel as they can be a very effective way of presenting data on a dashboard, and have now updated it for Excel 2010…

This Heatmap Table is designed to show you the revenues and the discounts of a company over the course of one year per product group. The size of a bubble shows the revenue made in a particular month and the bubble color shows the discount rate given. The discount rate has been encoded as a range of green colors, ranging from a light green, for low discounts to a dark green for high discounts. The years and product totals are shown at the right and bottom as an integrated part of the table.

Tufte often talks about the integration of numbers, images and words; I think he’s quite right. A way to achieve this in Excel is to integrate charts into tables, so called graphical tables, a very effective means to show “More Information Per Pixel“.

The heatmap table is based on a regular Excel bubble chart. To integrate a bubble chart into a table the bubbles are positioned in a matrix that has the same row and column layout as our table.

 

 

 

 

 

 

 

 

 

 

 

 

 

In our case we generate a data series table with one column for the X-Series going from 1-12 for January – December and one column for our Y-Series going from 1-8 for our 8 product groups and one column for revenue.

In the sample spreadsheet we’ve setup some simple excel formula to translate data from the classic grid layout:

to the required format:

Now we can insert the bubble chart:

 

To ensure that the charts fit exactly into the table grid we set Min/Max for the X axis to 0.5/12.5 and for the Y axis to 0.5/8.5. Excel would calculate much larger auto scales otherwise. Also set the Major units to 1 so we can use that later to set some grid lines.

 

Now we remove the legend, the X and Y axis, maximize the plot area and align the chart with the Excel table. As the bubbles are initially too large we have to make them smaller. To control the bubble size go to Data Series Options and scale the bubble size to 50%:

 

This already makes a nice bubble table you could use to reproduce the Twitter Charts.

For the grid lines format your table headers and grid lines with light gray grid lines. Resize the plot area, remove the border and re-position the chart so that the chart and the table grid lines align.

To create the heatmap with different colored bubbles we use the fact that by default Excel does not plot data points for #NA values.  For the heatmap we overlay 8 bubble series, one  series per green shade, and show a revenue bubble only if the value fits into the value range that corresponds with a green shade of our color ramp, otherwise we show #NA.

We divide the range MAX(Discount)..0 into 8 groups to define the colours.

The data series columns use the following formula to test if a discount value corresponds with an interval / colour shade:

=IF(AND($E7>I$6-Step,$E7<=I$6),$D7,NA())

The formula returns the revenue, if the discount values is in the interval defined in the column header I$5.

 

 

Now create the eight data series so that the bubble size refers to the eight columns in the data table:

 

And use the Excel chart styles to pick a colour range – make sure you  remove the border from the chart area.

 

 

And you could use the chart styles to quickly switch between different colours – or customise each series to refine the colors.

You can download a starting point for these files here: HeatmapSample.xlsx. Most of the formulae should adapt to data values that you can feed into the data sheets, including data straight from Analysis Services if using XLCubed grids or formulae.

You can see an interactive version of the Heatmap here – we added a link to some cube data, some Slicers for driving the parameters and then published to XLCubedWeb.

 

 

Flexible time-series graphing from a slicer

We are often asked how to drive a chart from a slicer in XLCubed and how to plot days/months for a month or year. Base case this is fairly straightforward, you can set up a grid which is based on the previous ‘x’ months of a slicer selection for example. The difficulty can be where you want to vary the behaviour depending on which level of the hierarchy the user chooses. This is particularly true where the hierarchy contains semesters or quarters.

The example below shows a technique to handle this complexity and display the chart in a way meaningful to the user in each case. The report is based on a slicer that allows the user to switch between showing the graph data based on quarters, months or days.

You can download the Excel spreadsheet that is used in the example here TimeSeriesGraphFromSlicer

This connects to the Adventureworks demo database which ships with Analysis Services.

The diagram below shows the flow of data from each worksheet showing the final result in the sheet Chart.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Workbook Sheet – Chart

This sheet shows the graph based on the data chosen in slicer above it. This switches the graph data between quarters, months and days depending on the slicer selection.

 

Workbook Sheet – GridForChart

This shows the data that will be graphed, depending on the choice made by the slicer selection. In this example it is months July 2001 – June 2002. FY2002 has been selected by the user (in this example Financial Year 2002 runs from July 2001 – June 2002).

Note that cells A10 – A21 contain the value ‘TRUE’ – these cells contain an XL3RowVisible statement as follows:

=XL3RowVisible(B10<>””)

This statement hides rows with no data so that they are not plotted on the graph.

Workbook Sheet – SlicerToMonthDay

This sheet contains the data that is returned by the choice of the slicer in workbook sheet Chart.

User selects a month

The data will be graphed as days. For example, if the user selects July 2002 then the graph will be displayed with each day in July along the x-axis. These are defined in XLCubed as ‘Children of’ the slicer.

User selects a quarter year

The data will be graphed as months in a three month period. For example, the user selects Q1 FY 2003 and the data displayed is for three months from July 2002 – September 2002 as below. These are defined in XLCubed as ‘Descendants of’ the slicer at month. This will be the same when the user picks year, semester or quarter.

User selects a half-year

The data will be graphed as months in a six-month period. For example, the user selects H1 FY 2003. The screenshot below shows the data that will be graphed.

However, it can be seen that the values Q1 FY 2003 and Q2 FY 2003 should not appear on the graph.

Using the Edit Member functionality it is possible to remove these so that they do not appear as points on the graph.

To do this, edit the Date.Fiscal member and click on Advanced tab.

Click on the drop down next to first member – that member set is the resulting data when the user selects H1 FY 2003 and shows the data that is in cells B10 – B43 in sheet SlicerToMonthDay.

 

The screenshot below shows the data that will be subtracted – it is in effect the actual value selected by the user via the slicer alongside the two Fiscal Semester values Q1 FY 2003 and Q2 FY 2003.

 

The GridForChart sheet now shows just the six months that should be graphed. As explained earlier further manipulation using the XL3RowVisible functionality removes blank rows.

 

The screenshot above shows the graph with six months of data for H1 FY 2003 for months July 2002 – December 2002, and the quarters have been dynamically excluded.

The end result is a flexible time selector where the user can choose dates at different levels in the hierarchy, and will always get a meaningful and in-context time series chart.

 

 

Data Visualization – a real world example

In the following example we work through a real world example of a data visualization. We’ve chosen an example that involves Operations data – this is fairly non-domain specific so hopefully it can demonstrate some important points. The first, and most important point is that you have to define your audience.

We receive many questions about “what is the best chart for this situation” or “what colour should I use for emphasis”. These questions are usually attacking the problem from the wrong angle. The one question you need to ask before anything else is “who is this visualization going to be seen by and how?” Is it in a boardroom on a printed sheet or across a trading floor on a plasma screen. Are the consumers domain experts?

This example features data about an investment bank’s operations processing, the audience being the clients of the Operations department.

Starting Point

Initially the project started out as simply trying to record what operational problems were encountered on a daily basis across different product lines. A reporting system was built and various generic reports produced:

DVBlog1

Unfortunately the reports either didn’t contain data at a granular enough level or it was difficult for the product managers to see where the issues were occurring and what the trends were. In reality the report showed what the major problems had been – unfortunately this was already known, as when something major goes wrong you remember getting shouted at!

What was requested

The client wanted a report that showed where the problems were occurring across business lines (rather than operational units) and how they were doing historically in a single page that could be included in a weekly MIS pack (they currently had four pages per product line (8) so a total of 32 pages. As a first pass they simply wanted an Excel worksheet they could update manually:

DVBlog2

We felt this solution lacked clarity and it was very difficult to spot trends across products.

What we proposed

We designed a solution using MicroCharts to allow small multiples of charts to show a variety of views:

DVBlog3

This solution allowed the user to view the data simply as a cumulative set of data by Product (top line) or by Root Cause (vertically) and then look deeper into historical trends in the centre of the chart. For example, its fairly easy to see spikes in the Root Cause data historically and see that the overall trend has improved over time. By ranking the Products and Root Causes you immediately give some sense of scale to the data. For example you can see that there are many more Application failures than any other type of problem, but the majority of root causes are otherwise fairly evenly distributed.

One other point worth noting was that the original colour scheme was much more muted, but the client got very upset that it looked like a competitor’s corporate colour and wanted it to be “louder”.

What was the user reaction…

Ecstatic, 1 page replaced 34 and they could see at a glance how the entire (large) organisation was working but also quickly find out detail for a particular area and identify trends.

(de)Faults in Excel Charting

I recently spoke at SQLBitsIII, and an aspect which went down well was a simple overview on how to make the most important aspect of a graph, namely the underlying data, the prime focus and clear and easy to read. I also had the opportunity to attend Stephen Few’s Information Visualisation Workshops in London, which I’d thoroughly recommend. Stephen also spent some time, as part of a much more detailed overall agenda, on how a typical default chart can be morphed into an effective display.

So it’s back to basics this week, and how to improve the standard, out of the box Excel chart. Unfortunately, despite it’s pervasiveness, the default chart settings, which many users will never stray from, are in the case of Office 2007 not ideal, and in earlier versions, pretty awful. In this piece I’ll outline a few simple steps which can turn the default visual delights of the Excel graph into something you need not be embarrassed to put on the projector.

I’m using ‘classic’ Excel as my start point, because it’s still the incumbent in most organisations (and also because it’s worse). The example is for column charts, but the majority of the tips are valid for any chart type. As our start point we have the unit sales data for 3 products across 6 countries, as a default Excel Column chart, below.

BadChart

 

 

 

 

 

Nice. It’s wrong in a lot of ways, but how many hundred times have you seen this or a version of this? It’s well trodden ground if you have read Tufte, Few at al, but the key recommendations to improve things are surprisingly simple, and quick to implement.

1) Remove the Clutter and noise

The purpose of the chart is to display the data of interest clearly and concisely. It’s not to distract the user with pretty shading or 3D effects etc. Although the default chart is no-frills, there are a number of items which are adding nothing, or have undue prominence, and in doing so detract from the overall goal.

  • The Plot Area
    • The grey background to the plot area adds nothing, so we remove it
    • The border on the plot area – remove it also (numerous studies have shown we only need two axes to effectively group and visualise data)
  • Gridlines
    • The default gridlines are black, too visually intense. They are there for reference when required, not the prime focus, so are best muted – set them to a light grey.

2) Axes and Legend

The axes frame the chart, and are a key point of reference; however they should not draw the focus from the chart itself. As with the gridlines, they should be toned down.

  • Change the default black font colour to charcoal / dark grey
  • Change the default axis colour from black to charcoal / dark grey
  • Typically reduce the font size to 8

Rules for the legend are similar to those for the axis

  • Change font from black to grey
  • Remove border or change it’s colour to very light grey
  • Typically reduce font size to 8
  • For a clustered column, my preference is for the Legend positioned at the bottom, and reading across in the display order of the columns.

3) Columns and colour

The black column borders add nothing, and as such should be removed, they are another form of Tufte’s ‘non-data ink’.

On to colour, and unfortunately Excel’s default chart fills are heavily saturated plum and wine with a light cream..  So I’d strongly suggest changing the chart colour palette. For column charts, there is typically a reasonable block of colour for each series, so the colour scheme shouldn’t be too bold, or it becomes an eyesore. You should aim for mid-intensity colours of similar saturation (unless one is intended to stand out), pastels tend to work well.

 

All the steps above are simple and fairly fast to action, with one exception, the colour scheme. Unless you already have pre-prepared palettes it’s possible to spend an age trying to get the ideal combination – remember the 80/20 rule!

GoodChart2

 

 

 

 

 

 

In my example above, which hopefully you’ll agree is an improvement, I’ve used the colour palette from our upcoming ‘Chart Tamer’ product. Chart Tamer is a lot more than just a colour palette, but that aspect has benefited from minds with much more expertise in colour than mine, and I’ll go with their choice over mine every time!

Household Income Distribution 1967 – 2005 As Small Multiples Chart

In my last post I tied to fix an overloaded line chart Jorge presented in a recent post about loss aversion:

image
Jorge asked "does it make any sense to add those nine series to a single chart?
My attempt to fix the chart by using some color coding, has its shortcomings that caused quite some discussion.

image

So again, how can you give the users all the data they expect while keeping the chart clean and readable?

D Kelly O’Day pointed out "More data or better colors won’t help a poor chart type selection" and presented a dot plot

image

Lets try to select the right chart type. In Chart Rules, As Simple as Possible, But Not Any Simpler! I presented an easy to learn set of rules to determine the best chart type .


1. Determine the relationship you want to display

In our case a we have a Distribution Relationship, we want to show the Distribution of the Income Levels


2. Determine if you want to emphasize individual values or the overall pattern and

emphasize individual values or the overall pattern  and Determine the chart type

As we want to emphasize individual values a column chart works best.

image

This chart already gives us a good feel for the income distribution in 1967- Looks like a almost perfect bell distribution with a belly for the mid income levels. But how did things change from 1967 to 2005? Lets create a set of small multiples to show the situation in 1967, 2005 and the increase from 1967 to 2005.

image

Using Color to Group and Label in Charts

Jorge wrote in a recent post about loss aversion, the fact that “people strongly prefer avoiding losses than acquiring gains”.

Loss aversion […]: Translated to chart-making, it means that there is a “tendency to avoid losing data at any cost”. The chart below shows you the Money Income Of Households as published.

image

Take the above chart, for instance: does it make any sense to add those nine series to a single chart?

Remove irrelevant data series and you risk a mutiny on the Bounty, even if relevant trends are easier to detect. It is absurd, but very human.

So, how can you give the users all the data they expect while keeping the chart clean and readable?

Continue reading “Using Color to Group and Label in Charts”