Small Multiples on River Quality

The phrase small multiple was popularised by Edward Tufte, and has become a generic term for a visual display using the same chart or graphic to display different slices of a data set. Their close positioning and shared scale make comparisons very easy and shared trends or outliers can be quickly spotted. Various other terms are also used to describe this charting approach, or specific aspects of it, including Trellis Charts, Lattice Charts, Grid Charts and Panel Charts.

The most common use case for small multiples is separate line charts to compare trend across a large number of varying elements. Placing them all within one chart would cause either a ‘spaghetti chart’ , or lots of occlusion as shown in the comparison below. Here we use a standard Excel line chart, and an XLCubed small multiple to chart the same data. Separating the charts while keeping a consistent axis scale makes for a much easier comparison than in the single chart.

We took a slightly different approach when using small multiples to take a look at differences in river water quality across regions of the UK. Our source data was not absolute numeric values, but 14 years of results categorised into four bandings (bad, poor, fair and good). We wanted to provide a ‘one-pager’ which gave a feel for the trend within each region, but also access to the annual breakdown of the different water qualities.

In the end we settled on a Small Multiple display of 100% stacked columns as shown below.

A percentage base seemed a sensible way to approach the data, as different regions will have differing numbers of rivers and of samples taken. Using this approach we’re able to see a comparison of the relative water quality rather than dealing in absolutes.

The user selects a geographic area of the country to view the regional breakdown within the selected area. The water quality for a particular year can be analysed by locating the region, and the specific year to see the percentage breakdown for each of the four categories.

The colouring of the 4 categories was chosen to aid ‘at a glance’ recognition of the overall water quality by region, and also of the trend. Dark blue signifies bad quality water (opaque), and light blue signifies good quality (think ‘you can see right through it….’).

So to read the display overall, or for trend:
• Dark colour signifies water quality problems.
• Light colour signifies good quality water.
• Reading left to right, increasing colour saturation shows declining quality over time.
• Reading left to right, decreasing colour saturation shows improving quality over time.
• Any region can be zoomed in on to see a larger chart and understand the breakdown in more detail.

Fairly quickly, and from just this one display we can draw a number of conclusions as below:
• Across the region, as a broad brush summary, water quality has improved since 1992.
• Doncaster has shown strong and steady improvement.
• Kingston upon Hull has the worst quality overall in the region, and varies significantly year on year.
• If you’re off for a swim in a Yorkshire river, Richmondshire looks a good bet!

We’ve designed a pre-set view in this case to work for the data in question, but the small multiple concept is also very powerful when interactively exploring data. A picture can tell a thousand words as they say – take a look at our youtube videos on small multiples: Video1 Video2

 

Sql Server “Denali” CTP3 – first impressions…

Microsoft recently released their third CTP of Denali the upcoming SQL Server release (SQL Server 2011), so here are some initial thoughts now it’s more widely available.:

The first thing to look at is the new Tabular mode for Analysis Services (as opposed to the traditional multi-dimensional mode, which is still available). This is the server version of the VertiPaq engine first seen in the PowerPivot add-in, and moves the engine from being a personal/team tool to an organisation/enterprise level affair.

This means IT are going to get involved (and people can disagree about how they feel about that!), but that report sharing should be easier as data is held centrally. In the past the report contained all the data, which could make for very large workbooks, or you published to SharePoint, which not everyone was set up to do.

Cubes can be queried using MDX, which is great for a front-end vendor like us, and XLCubed works out of the box against the CTP. Existing functionality is working smoothly, and as Microsoft Gold Partners we’re working closely with the releases to utilise all the functionality for the RTM.

We have ported a few existing cubes to the new architecture and one first impression is that removing columns or using perspectives is going to be needed to keep things sensible for end-users, you can quickly end up with hundreds of attributes.

The ability to create hierarchies was something that was often asked for in PowerPivot, and thankfully that’s there now. This should simplify many cubes.

Attribute-tastic

 

The intricacies of MDX put most business users off trying to use it directly, whereas DAX’s similarities with Excel functions means there is more scope to have users create formulae on the fly. Examining how best to expose that to users is something we’ll be spending some time on in the coming months..

Easier distinct counts and the built in date calculations are the obvious candidates, but there are a number of others which we feel we can make more accessible for the majority of users.

It’s certainly an interesting move, and thinking in Tables and Columns instead of the Multidimensional model takes some getting used to, conversely for some people its more natural.

It’ll also be interesting to see how MDX and DAX are integrated. The Tabular server supports both languages for query. Currently using MDX you can use the “With Member” syntax to create members sent to the Tabular server, could you declare a DAX calculation in a similar manner?

Parent-Child Dimensions in Analysis Services – Performance Walkthrough

Parent-child hierarchies are a good fit for many data structures such as accounts or employees, and while they can speed development in some cases, they can also cause performance problems in large cubes.

We often see customers with these type of performance issues, and thought it worth sharing a simple technique for altering the dimension structure to improve query speed.

The problem

Often parent-child hierarchies are created as this is the structure used in the relational source, so they seem a good fit to model the members. In many cases though data is only at the leaf level of the hierachy, meaning parent-child isn’t really needed.

Performance problems occur because no aggregates are created for parent-child dimensions, as detailed in the Analysis Services performance guide:

Parent-child hierarchies

Parent-child hierarchies are hierarchies with a variable number of levels, as determined by a recursive relationship between a child attribute and a parent attribute. Parent-child hierarchies are typically used to represent a financial chart of accounts or an organizational chart. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled. As such, refrain from using parent-child hierarchies that contain large numbers of members at intermediate levels of the hierarchy. Additionally, you should limit the number of parent-child hierarchies in your cube.

If you are in a design scenario with a large parent-child hierarchy (greater than 250,000 members), you may want to consider altering the source schema to re-organize part or all of the hierarchy into a user hierarchy with a fixed number of levels. Once the data has been reorganized into the user hierarchy, you can use the Hide Member If property of each level to hide the redundant or missing members.

 

The performance guide hints at re-organizing the hierarchy to improve perfomance, but doesn’t say how.

The solution

This article will walkthrough the steps needed to change your parent-child hierarchy structure to have real levels, so that aggregations work, and your performance is as good as you expect with normal hierarchies.

This process is known as flattening or normalizing the parent-child hierarchy.

Firstly, let’s look at the data in our relational source.

Code: Sql Create ScriptSelectShow

Not a large dimension, but enough to demonstrate the technique. As you can see my real products are all at the leaf level.

The strategy is quite simple:

  • Create a view to seperate the members into different levels.
  • Create a new dimension using these real levels.
  • Configure the dimension to appear like the original parent-child dimension, but with the performance of a normal dimension.

Create the view

We want to create a denormalised view of the data. To do this we join the Product to itself once for each level. This does mean we need to know the maximum depth of the hierarchy, but often this is fixed, and we’ll build in some extra levels for safety.

The tricks here are:

  • Use coalesce() so that we always get the lowest level ID below the leaves, never a NULL. This allows us to join to the fact table at the bottom level of our hierarchy.
  • Leave Name columns null below the leaves, this will allow us to stop the hierarchy at the correct leaf level in each part of the hierarchy.

Code: Sql View ScriptSelectShow

Running this we get:

Obviously we can update this view to create more levels as required, but 5 are enough for now.

The Dimension

Next we go to BIDS, and add the view to our Data Source View, and then add a new Dimension based on the view.

The key steps to creating the dimension correctly are:

  • Set the key attribute to Level5ID, and the name to Level5Name.
  • Create an attribute for each Level ID, and on each set the Name Column appropriately.
  • Create a hierarchy using these attributes in order.
  • On each attribute set AttributeHierarchyVisible to False.
  • On each level of the hierarchy set HideMemberIf to NoName.
  • Set up the Attribute Relationships between the levels.

You should end up with the following:

Dimension Structure

 

Attribute Relationships

 

If you browse the dimension you’ll see that it never goes as far as level 5, even though it exists. This is becuase we set up the member hiding option, and returned NULLs in our view.

Conclusion

And that’s it done, you can now join to your fact tables at the lowest level, build your cube as normal and get the performance benefits of aggregation!

See also

A tool to achieve the same result is available from Codeplex, we’ve not personally tried it but may well be a timesaver. This works in a similar way to the example above, but it’s often useful to understand how something works, even if you choose to automate it.

PowerPivot, SQL R2, Sharepoint 2010, Office 2010.

So we’ve been using PowerPivot for a while now, and Office 2010 has been part of our lives for some time. I’ll use this blog to answer some of the questions that keep cropping up in conversation with our customers:

1. Does XLCubed work with Excel (Office 14) 2010?

a. Yes, we’ve been using it since the first CTP release and each release since then.

2. Can I use XLCubed Web with SharePoint 2010?

a. Yes, publishing to the web and embedding the reports within your SharePoint site works in exactly the same way as with previous versions.

3. Does XLCubed connect to PowerPivot?

a. Yes, XLCubed connects to the PowerPivot published cubes, and our client tools can be used to build reports and dashboards from them.

4. Can I build reports from SQL Server R2 using XLCubed?

a. Yes this will work just fine, just as you can build reports from previous version of SQL or other relational sources. (here is an example)

PowerPivot in the real world

The services team have been working on migrating some of our internal models and sample databases across to a PowerPivot environment – looking at the pros and cons, using DAX rather than MDX to perform some calculations. Results have been varied, its been interesting to see some features that we’ve had for a while (like cube formulas, slicers and web parameters) appear in a similar way in PowerPivot.

Quite clearly PowerPivot isn’t the be all and end all or anything like a replacement for Analysis Services, but it certainly has a role for tactical solutions, some power user analysis, and we think likely also for RAD prototypes of larger scale AS implementations. It doesn’t venture into the gap left by PerformancePoint Planning (as many thought it would in early 2009) – we’ve moved to address this area with the XLCubed PM suite that uses in memory OLAP cubes and/or Analysis Services.

Trying out some of the tools

Here’s a few download sets for you to try, take careful note of the hardware spec and requirements for the MS ones though:

The 2010 Information Worker Virtual machine

Register and Download Office 2010

PowerPivot 32Bit, 64Bit

XLCubed Evaluation

If you would like to evaluate against your own data – contact the XLCubed Product team for evaluation editions or if you want to try a no risk proof of concept or prototype contact the XLCubed consulting team.