Data sources for Excel dashboards: avoid spreadsheet hell

This is the first of two twin posts where we’ll discuss the alpha and omega of Excel dashboards: data access and dashboard publication. These are two weak areas in Excel, and they should be approached carefully when planning for a new dashboard. Let’s start by reviewing the available data access options.

Copy / Pasting data

Are you or some one in your organization populating the spreadsheet manually? Or are you copy/pasting the data into the spreadsheet? This is the simplest method of getting data into Excel, but it can be dangerous. It should be avoided when better options are available.

When you are dealing with some kind of structured data management (like you do when you create a dashboard) you have to plan ahead and make sure that when data changes it doesn’t break your well crafted dashboard. Each function, each chart, must know where the data is and adjust for these changes when needed.

When you are pasting data there is a a high risk of break something. The number of rows or columns in the new dataset may change, and things like a time series chart may not recognize the new time periods and probably you’ll have to update references manually. Again, plan carefully or you end up in a maintenance hell.

External table

You can create a link to an external table in Access, Oracle or other database tool via a standard ODBC connection. This will ensure that the data is correctly funneled into the spreadsheet, but with real-world data it is very easy to have more records than the Excel 2003 limit of 65,536 rows. You’ll be better off if you link not to the raw data itself but to a query/view that aggregates the data (one of the basic rules for dashboard design in Excel is to avoid calculations and derivative data; the data should come from the source already prepared to be displayed).

Once the data is in Excel, there is not much difference between this and the previous option. You still need to use use lookup functions to retrieve the data and use it in report tables and charts, and data integrity is a stressful thing that you must ensure all the time. When possible, use database functions like DSUM instead of lookup functions (there will be a post discussing this).

Pivot tables

For an out-of-the-box Excel installation you may want to consider pivot tables. They are an interesting option for smaller datasets and they have a nicely flat initial learning curve. Please note that pivot tables will make your file size much larger because they store all the data in the spreadsheet, so scalability can become a major issue. Also, they work best with a strict hierarchical data structure. If your data doesn’t fit exactly in this concept this may be a problem. If you have a larger dataset you should consider an OLAP cube instead.

OLAP Cubes

The concept of an OLAP cube can be something scary for the average Excel user, but once you start using them you’ll never turn back. Specially of you are using what Charley Kid calls an “Excel-friendly OLAP cube”.

Unlike the other methods, an Excel-friendly OLAP cube (like XLCubed) will not store the data in the spreadsheet, thus eliminating the need for the usual data refreshing methods (open the dashboard, refresh, save and close). The cube is automatically updated and you can query it using formulas similar to GETPIVOTDATA. This makes a huge impact on the way you work. You get all the benefits of a regular pivot table plus several life-saving extras. The dashboard will be simpler, cleaner and easier to maintain.

Final Thoughts

You have several methods for data management in Excel, and you must decide what is the best method for each specific dashboard. Scalability is always an issue, so be sure your data don’t outgrow your chosen method. An Excel-friendly OLAP cube may require some immediate investment but will save you a lot of hassle in the long run.

Data management in Excel is a critical factor, and it will discussed in detail in future posts.

The next post discusses the other end of a dashboard project: how to make the dashboard available to the users.

Extended Deadline of Excel Dashboard Competition

We will be extending the deadline of our Excel Dashboard Competition by two weeks to 15 June. So you still have a chance of winning the iPhone or the InfoVis workshop.

FlowingData came up with the nice idea of building a visual display or the hundreds of data sets available in the U.S. Census 2008 Statistical Abstract.

I loaded one of the sheets (Patents and Trademarks.XLS) into Excel, enriched
it with MicroCharts, added a detail chart and published it to the Web.

 

 

Applied Gestalt Laws: Table Alignment

Most reports are based on combinations of tabular layouts, so to continue my series about visual design (see my previous post) I will focus on the most common and simple problem to fix: The fundamentals of how to align numbers and text in tables and how to treat their headings.

Here are the rules

  • Right-align a block or column of whole numbers or of whole numbers and text.
  • Left-align a block or column of whole text.
  • Align numbers at the decimal point (or imaginary decimal point).

clip_image001

 

 

 

 

 

 

Seems obvious really but they are so often rarely applied. A Google image search on “excel table” reveals what most Excel users do

….they simply use what Excel defaults to:

clip_image002

 

 

 

 

 

 

…or if people are more adventurous often feel that centered columns would somehow looks better:

clip_image003

 

 

 

 

 

 

…or even worse they apply the Excel Tables styles:

clip_image005

 

 

 

 

 

 

All those habits make the table more difficult to read. To understand why this is the case let’s use the Gestalt Law of Proximity.

clip_image007

 

 

 

 

In the picture above my brain tells me that there are 6 columns of 9 dots in one group. Simply moving the dots of the first row to the left breaks this grouping and differentiates the dots into 2 groups

clip_image009

 

 

 

 

 

This is exactly what happens if you left align column headers on numerical columns: As shown below and the brain does not associate them anymore which is what I want in most cases for headings and numbers.

So Excel Defaults are not right as shown below.

clip_image011

 

 

 

 

 

 

Right aligning the headers brings them together.

clip_image013

 

 

 

 

 

 

The grouping still works even if the shapes have a different width but remain either right or left aligned:

clip_image015

 

 

 

 

 

 

The reason for this is explained by the next Gestalt law of Continuity, the right aligned figures and the left aligned text are perceived as columns

The table below shows this affect with the arrows showing the continuation of the series and the same works with columns of left or right aligned figures or text, we perpetuate the series and perceive the column as one object. Even inserting a row to visually separate the figures and the column headers does not break the grouping…

clip_image017

 

 

 

 

 

 

….what can be explained by the Gestalt law of Closure.

Hence we perceive the columns of numbers and headers still as a unit even though the headers are placed somewhat apart from the figures.

clip_image019

 

 

 

 

 

 

If we now disable the Excel grid lines we end up with table which merely relies upon white space and Gestalt laws to format the table providing clear associations: A first class table.

clip_image020

 

 

 

 

In western cultures we read text from left to right so it makes a lot of sense to left align text columns but not so for numeric columns. The eye has to search for the decimal point to get to the ten, hundred or thousand digit, this makes comparing numbers quite difficult if not impossible when many numbers are involved.

clip_image022

 

 

 

 

 

 

Here the Gestalt Law of Continuity can help, simply right aligning brings all tens, hundred, thousand digits on the same virtual line and makes comparison straightforward and simple.

clip_image024

 

 

 

 

 

 

Unsurprisingly, centering numbers in column causes exactly the same problem as shown below. Another visualization “No No”.

clip_image026

 

 

 

 

 

 

Interestingly, the same rules apply when we move beyond simple text and numbers to MicroCharts such as sparklines, column charts and bullet graphs. Especially when the sparklines contain m
issing values.

clip_image027

 

 

 

 

 

 

Right or center alignment leads to severe difficulty comparing values of the same period in different rows in the table.

clip_image028

 

 

 

 

 

 

If the sparklines have the same amount of data points this is not an issue but in dynamic reports this may not always be the case so its better to be safe than sorry.

clip_image029

 

 

 

 

 

 

When using visual tables another nice trick to is to introduce an axis to a column chart to aid in the visual alignment and to group periods into blocks through the alternate shadings. The column chart above use a column chart to visualize units sold and an area chart for the other measures. The different shading groups the periods into 6 month units and the column bars aids the visual alignment. So to recap, make it easy for people to read your tables by following how your brain inherently processes information as explained through “Gestalt” laws. Here are the rules again

  • Right-align a block or column of whole numbers or of whole numbers and text.
  • Left-align a block or column of whole text.
  • Align numbers at the decimal point (or imaginary decimal point).

I hope that this article has been useful and I look forward to dealing with other visualization techniques in later posts.

Gestalt Laws, Charts and Tables: The way your brain wants them to be

I get asked by a lot of people how I seem to be able to format my charts and tables so that they look good and still convey the information in the most effective manner. I thought I would share my experience through my blog posts.

The first thing to know is that I like to use a set of visual design rules when building charts and tables and I like to understand why the rules make sense. In this series of articles I am going to attempt to explain the rules and the reasons behind them. Most of these rules are simple and are based on a solid academic foundation.

In this blog post I would like to introduce the Gestalt Laws, a set of design rules based on research into perception psychology. In the 1930’s the German Gestalt school of psychology investigated how the brain groups and organizes visual shapes. Following this research the so-called “Gestalt” laws were established. These laws form much of the foundation of the techniques I use in table design and I intend to refer to many of them in this series of articles.

Gestalt Law of Proximity: The brain tends to group items together that are close together in space ie. In the same Proximity.

Gestalt Law of Proximity

In the picture above my brain tells me that there are 6 columns of 9 dots in one group.

Gestalt Law of Similarity: We tend to group objects with similar properties (color, shape, texture).

Gestalt Law of Similarity

In the picture above my brain groups the black and gray dots.

Gestalt Law of Continuity: When something is introduced as a series the brain tends to perpetuate the series

Gestalt Law of Continuity

Gestalt Law of Closure: We tend to complete incomplete objects

Gestalt Law of Closure

The table below applies all of the Gestalt laws above:

Table applying Gestalt laws

· The Gestalt Law of Continuity: The right aligned figures and the left aligned text are perceived as columns

· The Gestalt Law of Proximity: The region labels and figures for Scenario W6000 and Scenario W7000 are grouped by having some extra space between the columns.

· Gestalt Law of Closure: Although we have some space between the column quarter column headers and the figures we perceive them as one unit.

· Gestalt Law of Similarity: Formatting the negative numbers red makes them clearly stand out from positive numbers.

Small charts are beautiful

Well, since you are reading this, I’ll assume that you took the red pill, so let’s keep moving and find out how deep the rabbit hole goes.

We saw that people usually design charts larger than they need to be. Why? Is it because we can’t fit the data into a smaller space? No, it isn’t. It is because in smaller charts there is no room for non-data elements, like title, legend, grid lines. In the dominant “Excel chart defaults” school of thinking data is not a priority.

This is a simple exercise that you can try safely at home and demonstrates it clearly. Start by creating a line chart in Excel, like this one:

Excel line chart

You can see the data, right? Now make the chart smaller:

Excel line chart

Here is a fierce territorial competition, and guess who’s winning? Make the chart a little smaller:

Excel line chart

The title and the legend win, as usual. The data must be here somewhere, but who cares?

This chart size is not large enough. Or so it seems. But what happens when we remove some non-data elements? Since we don’t need the legend, and we can put the title somewhere else, we can remove both:

Excel line chart

We are getting our data back! Let’s just leave the data and a simple grid line:

Excel line chart

I used MicroCharts to display the same data using both a line and a column chart:

Sparklines

With MicroCharts, you can add a “normal band” or a reference line that helps you to understand how the data departs from the expected values.

The above charts show percentage change on previous period GDP at market prices in the US (1980-2009). Here is the same data for some selected countries in the EU:

Sparklines

Michelangelo said: “I saw the angel in the marble and carved until I set him free“. Like him, keep carving your chart until you set your data free. The essence of a chart is the patterns you discover, buried under all the junk. By making your charts smaller you are force to remove that junk.

Finding this “essence” is what sparklines is all about.

Information visualization: take the red pill

“To clarify, add detail”, says Edward Tufte.

A richer, more detailed picture, is a solid foundation for your decision-making processes. But to add detail you need a higher resolution display device (be that a computer screen or a sheet of paper), as we saw in the previous post.

Now, regarding the use of resolution, you have a “take the red/blue pill” kind of choice:

  • You take the red pill, accept Tufte’s advice and you’ll get more insights from your data
    • You take the blue pill, buy the stuff most vendors want you to buy and you stay under the illusion of the “professional looking chart”;

Let me detail the blue pill option. According to a large majority of vendors, we should get higher and higher resolutions, yes, but only to admire how eye-catching their products are, how well rendered, even if they display less and less actual data points. For the untrained eye, they may look like a Ferrari, but there’s a Tata underneath.

In reality, vendors and knowledgeable users have different agendas. Users want higher screen resolution to accommodate more data, while vendors want it because it makes they products look… “cool”? Apparently, in the mass market, form and function are strange to each other.
Let me exemplify the problem with a typical pie chart. I already gave my two cents for the never ending discussion around the sins and virtues of pie charts, so I will not do it again soon.

What I want to emphasize is that you can’t have more than five or six data points in a pie chart, but if you add texture to make it glow you will need to remove some data points and enlarge the chart. You need more space (= larger charts) in order for texture to be noticed, and there goes better (for efficient) information visualizations.

Unlike scientific visualization (that usually creates digital models of objects), information visualization focus on abstract concepts, like “inflation rate” or “market share”. You can’t add texture to market share. A chart is a “metaphoric space” where some objects (points, lines, rectangles) stand for an abstract concept, and we infer something from their relative positions in space.

So, you have a large, high-resolution computer monitor and also a high end color printer. You have the option between texture and detail. You can’t have both. Choosing detail you are focusing on the data and how to squeeze the juice out of it. Choosing texture you are adopting a marketing posture whereby you are not selling insights, you are selling yourself (it is an option, and some times you’ll need it). Or worse, in your naivety, you believe that information visualization is just a glowing 3D pie chart. Believe me, it is not.

So, what color is your pill?

More Information per Pixel

This blog is about most widely used BI tool in the world, Microsoft Excel!

Our mission is to connect business users to data such that an average Excel user can build his budgeting application, enterprise dashboard or data warehouse reporting with Excel. Such a solution can span from a simple Excel reporting that pulls data from a sheet database using lookup functions to a full fledged enterprise dashboard that sources data from an OLAP cube & can be browsed interactively in the web.

When it comes to dashboard visualization, we are not of the whistles, bells and gauges school of visualization, we advocate best practice in data visualization and effective dashboard charts like sparklines and bullet graphs.

If somebody tells you that you have to leave Excel, buy an expensive dashboard product, invest a lot in IT tools, IT administration, learn SQL and MDX to build an effective dashboard, don’t believe them.

Subscribe to the blog (email or RSS) and we’ll show you how to make Excel fly.