Warning: Excel can get Volatile

Excel is a great tool for dashboard/report delivery and design (it’s why we created our addin in the first place), but there is a hidden performance trap:

Offset, Now, Today, Cell, Indirect, Info and Rand

If you’ve ever used any of these formulae, you may have noticed that whenever you change a cell, or collapse/expand a data grouping, Excel recalculates. That is because these are VOLATILE formulae, as soon as you use one of these, Excel will enter a mode where everything is always recalculating, and for good reason.

Offset & Now are the formulae we see used most often. Let’s look at each of these in turn and talk about some alternate approaches to avoid this issue.

Offset

This is by far the most common of these danger formulae that we see in use. Here’s the formula definition:

=Offset(reference,rows,cols,height,width)
Returns a reference to a range that is a given number of rows and columns 
from a given reference.

We typically see these as part of a named range definition for driving chart source data – it allows the number of rows/columns driving the chart data to change automatically; a not unusual requirement when it comes to building reports (especially when a report contains some user defined filters or slicers). Here’s an example:

 

 

 

 

 

 

A very simple spreadsheet – we can type the number of months to display in the chart. In reality the number of months to display will probably be driven by the data available for the criteria selected. The screenshot already shows the issue we have –  the chart is setup to display a max of 12 months, but we only have 3 months of data available.

 

The most obvious approach is to use the Offset formula to pick the chart area to use automatically, we could create a named range such as:

 

 

 

 

 

 

Now we just change the chart data source to be the named range:

 

 

 

 

The chart is now plotting 3 months, but will automatically update to show the required number of months:

BUT we have now used a volatile formula –  although this is a simple workbook, we are now in a position where Excel is going to have to recalculate everything all the time. It’s probably a good time to look at why Excel is going to do that. Let’s have a look at very simple formula to understand how Excel recalculates things.

Consider the formula:

C1    =A1 + B1

We can see that C1 is dependent upon A1 & B1 – so whenever a value in either of these cells changes C1 will need to be recalculated to show the correct answer. Excel knows about this dependency because it maintains a dependency tree; it knows which cells need to be recalculated whenever any other cell changes. This is a very efficient way of working, if a workbook has thousands of formula, but only one values changes, and this only needs 10 of these formula to recalculate, then only 10 will be calculated.

If C1 contained:

C1    =Sum(A1:A20)

We know that C1 depends upon any of the cells A1:A20, and so does Excel. But what if C1 was:

C1    =Sum(Offset(A1,0,0,B1,1))

Which cells is C1 dependent upon? At a glance you could say A1 & B1.

 

 

 

 

 

 

but  B1 contains the number 20, so actually C1 is dependent upon A1:A20 and B1 (I’ve highlighted the additional cells that are dependent):

 

 

 

 

 

 

 

Just as we can’t see at a glance which cells C1 needs – Excel also can’t easily decide that. Therefore, Offset is volatile because, if it wasn’t then there is a danger that Excel would take so long to work out if it needs to be calculated that it might as well always calculate it.

There is an easy solution to this, INDEX. Here’s the formula definition (be careful, there are 2 ways to use Index, we want the REFERENCE one):

=Index(reference,row_num,column_num,area_num)
Returns a value of reference of the cell at the intersection of a 
particular row and column, in a given range
The big difference, compared to Offset, is that Index is going to return a single cell reference, so you need to use it as part of a range selection A1:Index(…). Here’s the same “Offset” Sum redefined as an “Index”:
C1    =SUM(A1:INDEX(A1:A20,B1,0))

The formula is simply saying the range we want starts at A1 and goes down the number of rows set in B1. The crucial difference is that the Index functions knows that A1:A20 is the maximum range we are likely to look at and therefore the dependencies are known just by looking at the formula itself:

We can now update the Named Range to use the Index function instead:

=Sheet1!$C$6:INDEX(Sheet1!$C$6:$C$17,Sheet1!$D$2,0)

 

 

Now/Today

The Now and Today functions return the current date to a cell – this is generally used so that when a report is loaded it will always show the data based on “Today”. Whilst this is not an unreasonable thing to want to do,  in reality what most people want is for the report to run for the most recent data, which could actually mean a number of things:

  • Yesterday (if the data is built in a nightly process)
  • The last working day (if the source transactional system is only used during office hours)
  • Current month etc.

The easiest solution is to let the data determine the date to use – if we use an XLCubed Grid or Query Table to retrieve the data we can simply setup a grid to retrieve the days/months where there is data:

And use the Sort option “Reverse” to display the most recent data first:

With the grid set to “Refresh on Open”  we know that A6 will always have the most recent date available in the cube and can base the rest of the report off that cell.

Incidentally, V6.2 of XLCubed introduces a new option to Slicers to automatically select the most recent date member when the report is loaded:

Ranking, Sorting and Filtering

Once we have returned cube members into a grid report we often need to exclude or change the order of the result set to provide more meaningful information. MDX (Multidimensional Expressions) language includes some very useful operators to provide filtering (FILTER), sorting (ORDER) and ranking (TOPCOUNT/BOTTOMCOUNT) of dimension members. These can be quite overwhelming even for power users of XLCubed.  So, in V6, we have introduced a new feature “Advanced Member Selections” to provide easy access to this powerful part of Microsoft Analysis Services.

Using this new functionality we can nest and combine these operations to answer complex business questions (for simpler operations you can right-click on a member in the grid and use the “Apply” menu to perform simple ranking, filters and sorting).

Filtering

So let’s go through a simple filtering example.  Say, for example, that we want to find the products at Product Key level that sold more than 25 units in 2003, Quarter 1 and show the sales figures for those subcategories during 2003 and its quarters.

  1. Start by clicking the Grid ribbon item (or the XLCubed > Design Grid menu item in Excel 2003 and below), and selecting the Internet Sales cube file
  2. Drag Calendar Period to Columns and Product to Rows. You can also drag any other hierarchies to Headers. In the example image below, Measures and Customer have been added there.

  1. Click on the Product hierarchy so that its details appear in the bottom-right panel.
  2. Drag the Product key level over to the right of the dialog. You can switch between the members view and levels view by clicking on the Show Levels icon ().
  3. Click the Advanced tab to show the advanced selection pane:

  1. Click the Members drop down and choose Filter result:


  1. Click the Calendar Period edit control in the grid to change its selection to the desired member (2003, Quarter 1):

  1. Select the This measure radio button, and select Order Quantity as the desired measure.
  2. Change the Operation to >, and type 25 in the edit field on the right:

  1. Click OK. The new filter is displayed in the advanced selections tab:

  1. Click OK again to run the Report – the Grid shows the members that fit our criteria:

 

So we can see the results, filtering by 2003 Q1, but displaying the values for All Time (or any other period we wish to use). We could have also used the Range selector:    to drive the period selecting from an Excel Range and our grid would automatically refresh whenever the driving value changes.

Ranking

Now let’s add a ranking to find the bottom 8 selling products at the Product Key level that have sold more than 25 units inQ1:

  1. Display the Product Hierarchy Editor dialog
  2. Click the Rank result icon () on the advanced selections tab to display the Edit Ranking dialog
  3. Select the Bottom radio button, and type 8 into the edit field
  4. Select 2003, Quarter 1 for the Calendar Period hierarchy in the grid below:

We now have the filter, following by the ranking:

 

Run the Grid: only the lowest 8 members are returned

 

Sorting

Now let’s sort the report on a different dimension – for example, descending order of the Q1 sales.

  1. Display the Hierarchy Editor for the Product hierarchy by double-clicking on the Product label in the Grid
  2. If it’s not already visible, select the Advanced tab
  3. Click the Sort result toolbar button ()
  4. Change the Calendar Period selection to 2003, Quarter 1:

  1. Click the Sort Descending (9-1) radio button
  2. Click OK. The new sort is displayed in the advanced selections tab
Click OK again to run the Report

 

Joining Results

It’s also possible to join different results together: combining both sets (UNION), excluding members (EXCEPT) and returning common members (INTERSECT).

So we could also add the top 10 products  along side the bottom 8 products to the grid. Begin by adding another member selection using the “Add Member List” tool-bar button:

As before, we select the list of members to rank (in this case the Product Key level) and then select the operation we want to perform, a Top 10:

There are various options to decide how to combine the lists, we’ll stick with Add:

 

 

And we get both results combined:


So the “Advanced Member Selections” feature provides lots of the power of Analysis Services in a simplified way  – to try this feature for yourself you can begin by downloading XLCubed.

Small Multiples on River Quality

The phrase small multiple was popularised by Edward Tufte, and has become a generic term for a visual display using the same chart or graphic to display different slices of a data set. Their close positioning and shared scale make comparisons very easy and shared trends or outliers can be quickly spotted. Various other terms are also used to describe this charting approach, or specific aspects of it, including Trellis Charts, Lattice Charts, Grid Charts and Panel Charts.

The most common use case for small multiples is separate line charts to compare trend across a large number of varying elements. Placing them all within one chart would cause either a ‘spaghetti chart’ , or lots of occlusion as shown in the comparison below. Here we use a standard Excel line chart, and an XLCubed small multiple to chart the same data. Separating the charts while keeping a consistent axis scale makes for a much easier comparison than in the single chart.

We took a slightly different approach when using small multiples to take a look at differences in river water quality across regions of the UK. Our source data was not absolute numeric values, but 14 years of results categorised into four bandings (bad, poor, fair and good). We wanted to provide a ‘one-pager’ which gave a feel for the trend within each region, but also access to the annual breakdown of the different water qualities.

In the end we settled on a Small Multiple display of 100% stacked columns as shown below.

A percentage base seemed a sensible way to approach the data, as different regions will have differing numbers of rivers and of samples taken. Using this approach we’re able to see a comparison of the relative water quality rather than dealing in absolutes.

The user selects a geographic area of the country to view the regional breakdown within the selected area. The water quality for a particular year can be analysed by locating the region, and the specific year to see the percentage breakdown for each of the four categories.

The colouring of the 4 categories was chosen to aid ‘at a glance’ recognition of the overall water quality by region, and also of the trend. Dark blue signifies bad quality water (opaque), and light blue signifies good quality (think ‘you can see right through it….’).

So to read the display overall, or for trend:
• Dark colour signifies water quality problems.
• Light colour signifies good quality water.
• Reading left to right, increasing colour saturation shows declining quality over time.
• Reading left to right, decreasing colour saturation shows improving quality over time.
• Any region can be zoomed in on to see a larger chart and understand the breakdown in more detail.

Fairly quickly, and from just this one display we can draw a number of conclusions as below:
• Across the region, as a broad brush summary, water quality has improved since 1992.
• Doncaster has shown strong and steady improvement.
• Kingston upon Hull has the worst quality overall in the region, and varies significantly year on year.
• If you’re off for a swim in a Yorkshire river, Richmondshire looks a good bet!

We’ve designed a pre-set view in this case to work for the data in question, but the small multiple concept is also very powerful when interactively exploring data. A picture can tell a thousand words as they say – take a look at our youtube videos on small multiples: Video1 Video2

 

Number Formats

One of the main reasons we use Excel is to analyse and display our data, for either our own consumption, or to show to others. In both cases, we want our data to be easily readable, and any important patterns to be immediately obvious.

We use colours, borders and other formatting to highlight important characteristics of our data, and to de-emphasize those features that should stay in the background (see  The Dashboard Squint Test for more). In just the same way, we can use number formats to highlight numbers that are unusual in some way, decrease the focus on uninteresting numbers, or to remove excess detail. Here we recap the essentials of numeric formatting in Excel.

The basics

To apply or change a number format, select the cell or range that needs to be altered, then either:

  • make basic changes (add or remove decimal places, use percentages and so on) using the Number button group on the Home tab (in Excel 2007 and newer),
  • make more advanced changes by right-clicking on the range and selecting the Format Cells option,
  • if you prefer keyboard shortcuts, you can show the format cells dialog by using the Ctrl+1 keyboard shortcut.

From the dialog, you can select some common and very useful formats, including:

  • Number: this allows you to customise the number of decimal places and whether to show thousand separators
  • Date and Time: for formatting dates and times, allowing a variety of shorter and longer options
  • Percentage: format the numbers as percentages, with the desired number of decimal places
  • Custom: allows you to specify your own custom formats (see below)

Simple custom formats

We’ll first go through some simple examples, including some of the standard formats mentioned above, so that in the next section we can build up more complex ones. To enter these formats, follow the steps listed above, then select Custom from the list on the left of the dialog.

  • To show a custom number of decimal places using a number format, write the number you want to show using zeros, for example 0.00 to show your number using 2 decimal places
  • To include digits only if they exist, use a # sign instead of the zero. For example, to only include the part of the number before the decimal if greater than 1 (or less than -1), use #.00
  • To include a thousands separator in the number, use #,###. We use the # symbol to avoid forcing Excel to display unnecessary zeros
  • To format numbers as percentages, just place a % after the format. For example, using the 0.0% format will cause 0.2534 to display as 25.3%
  • To give the numbers a colour, put the name of the colour in square brackets before the format, like this: [Red]0.0

More advanced formats

The formats that we have used so far only use one format for all numbers. In fact, Excel lets us specify four formats: one for positive numbers and one format for negative numbers, one for zeros and one for text – all in one cell. To do this, we use the semicolon to separate the different formats. For example, to format only negative numbers as blue, we can use 0.0;[Blue]-0.0;; In this example, because we have left the other sections blank, zeros will appear as empty cells on the worksheet.

If we combine the pieces of information from the last paragraph, we can find another useful format. Any cells with the ;;; format applied will hide any data in the cell. This can be useful if you want a formula in a particular cell, but don’t want to hide an entire row or column for it.

Another common case is where you have large numbers, but don’t need to see all the digits. In this case, it can be useful to just emphasize the important part of the number, by using this format: 0, This format will round to the nearest thousand, and remove the excess zeros. You can extend this to millions by using another comma, and it’s even possible to include an indicator that the number is shortened like this: 0,,”M”

There are many more special formats available, including changing the boundaries between the semicolons and date and time formats. Have a look at the further reading, below, for more information.

Number formats in XLCubed

XLCubed, being tightly integrated with Excel, allows you to specify number formats in two ways.

  1. For XLCubed value formulas, for example XL3Lookup and XL3ValueRankLookup, the number format of the containing cell can be modified in the way described above. In the same way as any normal Excel-based formula, the format is preserved when the value changes due to changes in your data.
  2. For XLCubed Grids, apply the format as above, then use the right-click menu, and choose Apply Format to Data. This asks XLCubed to maintain the format on the entire slice of your data. More detailed instructions can be found here.

Table of reference

ExampleFormatResult
0.23560.000.24
0.2356#.00.24
123567901235679
1235679#,###1,235,679
0.25340.0%25.3%
435[Red]0.0435.0
500.0;[Blue]-0.0;;50.0
-500.0;[Blue]-0.0;;-50.0
6541.21;;;
15984.1250,16
654915165.5150,,”M”655M

Further reading

Something on the Horizon

We had an interesting scenario while helping a customer extend an existing Excel dashboard.

We had recently performed some work to solve some performance and design issues they had with their existing Analysis Services cubes. They now had more of their underlying data available and the ability to query longer periods without the performance hit (a year’s worth of data vs 28-days).

They wanted to make the most of this by charting changes in daily sales data over the previous 12 months, broken down by their four main business groups. Ideally the chart would become part of the existing Management Report, the difficulty was the lack of report real estate to add the extra information. This is something we have all come across previously and of course typically solved by using In-Cell charts.

Plotting the data on an Excel chart in the space available would give us this:

 

 

Converting to Sparklines gave us a slightly better view, but given the number of data items being plotted still not ideal.

 

 

Luckily our customer had recently upgraded to V6.1 of XLCubed so we were able to use one of our newest incell chart types: SparkHorizons. There is a good explanation of Horizon charts as part of the research paper: Sizing the Horizon: The Effects of Chart Size and Layering on the Graphical Perception of Time Series Visualizations and Stephen Few has covered them previously.

Essentially a line chart is split into colored bands – degrees of blue for positive numbers and degrees of red for negative numbers. In XLCubed this is 3 bands of each colour. The separation of the vertical scale means that horizon charts can be a lot more effective than standard sparklines where the scale of the numbers vary significantly, but you still want to retain a common scale view.

In this case plotting the same data as horizon charts makes things a lot clearer:

It now becomes quite clear when sales a trending up vs down. It’s also possible to flip the negative values so they appear on the same direction as the positive values:

 

We are always looking at ways of developing and extending XLCubed, SparkHorizons were added because they looked like they had the potential to be useful where the data suited them, so it was pleasing to be able to use them in a real-world situation.

It’s also worth mentioning that although, in this case the data came from Analysis Services Cubes, because they are available as Excel formula they can be used to plot any Excel data, here’s an example of the formula:

=XL3SparkHorizon(Sheet1!$V$2:$V$262,Sheet1!E10)

This will plot the data from Sheet1!$V$2:$V$262 as a SparkHorizon graph in Sheet1!E10.