Ranking, Sorting and Filtering

Once we have returned cube members into a grid report we often need to exclude or change the order of the result set to provide more meaningful information. MDX (Multidimensional Expressions) language includes some very useful operators to provide filtering (FILTER), sorting (ORDER) and ranking (TOPCOUNT/BOTTOMCOUNT) of dimension members. These can be quite overwhelming even for power users of XLCubed.  So, in V6, we have introduced a new feature “Advanced Member Selections” to provide easy access to this powerful part of Microsoft Analysis Services.

Using this new functionality we can nest and combine these operations to answer complex business questions (for simpler operations you can right-click on a member in the grid and use the “Apply” menu to perform simple ranking, filters and sorting).

Filtering

So let’s go through a simple filtering example.  Say, for example, that we want to find the products at Product Key level that sold more than 25 units in 2003, Quarter 1 and show the sales figures for those subcategories during 2003 and its quarters.

  1. Start by clicking the Grid ribbon item (or the XLCubed > Design Grid menu item in Excel 2003 and below), and selecting the Internet Sales cube file
  2. Drag Calendar Period to Columns and Product to Rows. You can also drag any other hierarchies to Headers. In the example image below, Measures and Customer have been added there.

  1. Click on the Product hierarchy so that its details appear in the bottom-right panel.
  2. Drag the Product key level over to the right of the dialog. You can switch between the members view and levels view by clicking on the Show Levels icon ().
  3. Click the Advanced tab to show the advanced selection pane:

  1. Click the Members drop down and choose Filter result:


  1. Click the Calendar Period edit control in the grid to change its selection to the desired member (2003, Quarter 1):

  1. Select the This measure radio button, and select Order Quantity as the desired measure.
  2. Change the Operation to >, and type 25 in the edit field on the right:

  1. Click OK. The new filter is displayed in the advanced selections tab:

  1. Click OK again to run the Report – the Grid shows the members that fit our criteria:

 

So we can see the results, filtering by 2003 Q1, but displaying the values for All Time (or any other period we wish to use). We could have also used the Range selector:    to drive the period selecting from an Excel Range and our grid would automatically refresh whenever the driving value changes.

Ranking

Now let’s add a ranking to find the bottom 8 selling products at the Product Key level that have sold more than 25 units inQ1:

  1. Display the Product Hierarchy Editor dialog
  2. Click the Rank result icon () on the advanced selections tab to display the Edit Ranking dialog
  3. Select the Bottom radio button, and type 8 into the edit field
  4. Select 2003, Quarter 1 for the Calendar Period hierarchy in the grid below:

We now have the filter, following by the ranking:

 

Run the Grid: only the lowest 8 members are returned

 

Sorting

Now let’s sort the report on a different dimension – for example, descending order of the Q1 sales.

  1. Display the Hierarchy Editor for the Product hierarchy by double-clicking on the Product label in the Grid
  2. If it’s not already visible, select the Advanced tab
  3. Click the Sort result toolbar button ()
  4. Change the Calendar Period selection to 2003, Quarter 1:

  1. Click the Sort Descending (9-1) radio button
  2. Click OK. The new sort is displayed in the advanced selections tab
Click OK again to run the Report

 

Joining Results

It’s also possible to join different results together: combining both sets (UNION), excluding members (EXCEPT) and returning common members (INTERSECT).

So we could also add the top 10 products  along side the bottom 8 products to the grid. Begin by adding another member selection using the “Add Member List” tool-bar button:

As before, we select the list of members to rank (in this case the Product Key level) and then select the operation we want to perform, a Top 10:

There are various options to decide how to combine the lists, we’ll stick with Add:

 

 

And we get both results combined:


So the “Advanced Member Selections” feature provides lots of the power of Analysis Services in a simplified way  – to try this feature for yourself you can begin by downloading XLCubed.

Small Multiples on River Quality

The phrase small multiple was popularised by Edward Tufte, and has become a generic term for a visual display using the same chart or graphic to display different slices of a data set. Their close positioning and shared scale make comparisons very easy and shared trends or outliers can be quickly spotted. Various other terms are also used to describe this charting approach, or specific aspects of it, including Trellis Charts, Lattice Charts, Grid Charts and Panel Charts.

The most common use case for small multiples is separate line charts to compare trend across a large number of varying elements. Placing them all within one chart would cause either a ‘spaghetti chart’ , or lots of occlusion as shown in the comparison below. Here we use a standard Excel line chart, and an XLCubed small multiple to chart the same data. Separating the charts while keeping a consistent axis scale makes for a much easier comparison than in the single chart.

We took a slightly different approach when using small multiples to take a look at differences in river water quality across regions of the UK. Our source data was not absolute numeric values, but 14 years of results categorised into four bandings (bad, poor, fair and good). We wanted to provide a ‘one-pager’ which gave a feel for the trend within each region, but also access to the annual breakdown of the different water qualities.

In the end we settled on a Small Multiple display of 100% stacked columns as shown below.

A percentage base seemed a sensible way to approach the data, as different regions will have differing numbers of rivers and of samples taken. Using this approach we’re able to see a comparison of the relative water quality rather than dealing in absolutes.

The user selects a geographic area of the country to view the regional breakdown within the selected area. The water quality for a particular year can be analysed by locating the region, and the specific year to see the percentage breakdown for each of the four categories.

The colouring of the 4 categories was chosen to aid ‘at a glance’ recognition of the overall water quality by region, and also of the trend. Dark blue signifies bad quality water (opaque), and light blue signifies good quality (think ‘you can see right through it….’).

So to read the display overall, or for trend:
• Dark colour signifies water quality problems.
• Light colour signifies good quality water.
• Reading left to right, increasing colour saturation shows declining quality over time.
• Reading left to right, decreasing colour saturation shows improving quality over time.
• Any region can be zoomed in on to see a larger chart and understand the breakdown in more detail.

Fairly quickly, and from just this one display we can draw a number of conclusions as below:
• Across the region, as a broad brush summary, water quality has improved since 1992.
• Doncaster has shown strong and steady improvement.
• Kingston upon Hull has the worst quality overall in the region, and varies significantly year on year.
• If you’re off for a swim in a Yorkshire river, Richmondshire looks a good bet!

We’ve designed a pre-set view in this case to work for the data in question, but the small multiple concept is also very powerful when interactively exploring data. A picture can tell a thousand words as they say – take a look at our youtube videos on small multiples: Video1 Video2

 

Number Formats

One of the main reasons we use Excel is to analyse and display our data, for either our own consumption, or to show to others. In both cases, we want our data to be easily readable, and any important patterns to be immediately obvious.

We use colours, borders and other formatting to highlight important characteristics of our data, and to de-emphasize those features that should stay in the background (see  The Dashboard Squint Test for more). In just the same way, we can use number formats to highlight numbers that are unusual in some way, decrease the focus on uninteresting numbers, or to remove excess detail. Here we recap the essentials of numeric formatting in Excel.

The basics

To apply or change a number format, select the cell or range that needs to be altered, then either:

  • make basic changes (add or remove decimal places, use percentages and so on) using the Number button group on the Home tab (in Excel 2007 and newer),
  • make more advanced changes by right-clicking on the range and selecting the Format Cells option,
  • if you prefer keyboard shortcuts, you can show the format cells dialog by using the Ctrl+1 keyboard shortcut.

From the dialog, you can select some common and very useful formats, including:

  • Number: this allows you to customise the number of decimal places and whether to show thousand separators
  • Date and Time: for formatting dates and times, allowing a variety of shorter and longer options
  • Percentage: format the numbers as percentages, with the desired number of decimal places
  • Custom: allows you to specify your own custom formats (see below)

Simple custom formats

We’ll first go through some simple examples, including some of the standard formats mentioned above, so that in the next section we can build up more complex ones. To enter these formats, follow the steps listed above, then select Custom from the list on the left of the dialog.

  • To show a custom number of decimal places using a number format, write the number you want to show using zeros, for example 0.00 to show your number using 2 decimal places
  • To include digits only if they exist, use a # sign instead of the zero. For example, to only include the part of the number before the decimal if greater than 1 (or less than -1), use #.00
  • To include a thousands separator in the number, use #,###. We use the # symbol to avoid forcing Excel to display unnecessary zeros
  • To format numbers as percentages, just place a % after the format. For example, using the 0.0% format will cause 0.2534 to display as 25.3%
  • To give the numbers a colour, put the name of the colour in square brackets before the format, like this: [Red]0.0

More advanced formats

The formats that we have used so far only use one format for all numbers. In fact, Excel lets us specify four formats: one for positive numbers and one format for negative numbers, one for zeros and one for text – all in one cell. To do this, we use the semicolon to separate the different formats. For example, to format only negative numbers as blue, we can use 0.0;[Blue]-0.0;; In this example, because we have left the other sections blank, zeros will appear as empty cells on the worksheet.

If we combine the pieces of information from the last paragraph, we can find another useful format. Any cells with the ;;; format applied will hide any data in the cell. This can be useful if you want a formula in a particular cell, but don’t want to hide an entire row or column for it.

Another common case is where you have large numbers, but don’t need to see all the digits. In this case, it can be useful to just emphasize the important part of the number, by using this format: 0, This format will round to the nearest thousand, and remove the excess zeros. You can extend this to millions by using another comma, and it’s even possible to include an indicator that the number is shortened like this: 0,,”M”

There are many more special formats available, including changing the boundaries between the semicolons and date and time formats. Have a look at the further reading, below, for more information.

Number formats in XLCubed

XLCubed, being tightly integrated with Excel, allows you to specify number formats in two ways.

  1. For XLCubed value formulas, for example XL3Lookup and XL3ValueRankLookup, the number format of the containing cell can be modified in the way described above. In the same way as any normal Excel-based formula, the format is preserved when the value changes due to changes in your data.
  2. For XLCubed Grids, apply the format as above, then use the right-click menu, and choose Apply Format to Data. This asks XLCubed to maintain the format on the entire slice of your data. More detailed instructions can be found here.

Table of reference

ExampleFormatResult
0.23560.000.24
0.2356#.00.24
123567901235679
1235679#,###1,235,679
0.25340.0%25.3%
435[Red]0.0435.0
500.0;[Blue]-0.0;;50.0
-500.0;[Blue]-0.0;;-50.0
6541.21;;;
15984.1250,16
654915165.5150,,”M”655M

Further reading

Something on the Horizon

We had an interesting scenario while helping a customer extend an existing Excel dashboard.

We had recently performed some work to solve some performance and design issues they had with their existing Analysis Services cubes. They now had more of their underlying data available and the ability to query longer periods without the performance hit (a year’s worth of data vs 28-days).

They wanted to make the most of this by charting changes in daily sales data over the previous 12 months, broken down by their four main business groups. Ideally the chart would become part of the existing Management Report, the difficulty was the lack of report real estate to add the extra information. This is something we have all come across previously and of course typically solved by using In-Cell charts.

Plotting the data on an Excel chart in the space available would give us this:

 

 

Converting to Sparklines gave us a slightly better view, but given the number of data items being plotted still not ideal.

 

 

Luckily our customer had recently upgraded to V6.1 of XLCubed so we were able to use one of our newest incell chart types: SparkHorizons. There is a good explanation of Horizon charts as part of the research paper: Sizing the Horizon: The Effects of Chart Size and Layering on the Graphical Perception of Time Series Visualizations and Stephen Few has covered them previously.

Essentially a line chart is split into colored bands – degrees of blue for positive numbers and degrees of red for negative numbers. In XLCubed this is 3 bands of each colour. The separation of the vertical scale means that horizon charts can be a lot more effective than standard sparklines where the scale of the numbers vary significantly, but you still want to retain a common scale view.

In this case plotting the same data as horizon charts makes things a lot clearer:

It now becomes quite clear when sales a trending up vs down. It’s also possible to flip the negative values so they appear on the same direction as the positive values:

 

We are always looking at ways of developing and extending XLCubed, SparkHorizons were added because they looked like they had the potential to be useful where the data suited them, so it was pleasing to be able to use them in a real-world situation.

It’s also worth mentioning that although, in this case the data came from Analysis Services Cubes, because they are available as Excel formula they can be used to plot any Excel data, here’s an example of the formula:

=XL3SparkHorizon(Sheet1!$V$2:$V$262,Sheet1!E10)

This will plot the data from Sheet1!$V$2:$V$262 as a SparkHorizon graph in Sheet1!E10.

 


“Prev” and “Next” in XLCubed Slicers

We’ve been asked a few times in the last couple of months if we can build a ‘Previous / Next’ selector for date hierarchies, which allows the user to quickly navigate sequentially through months or days. The answer is of course ‘yes’,  otherwise it would be a very short blog..

One of the key strengths of XLCubed is it’s tight integration with Excel, and it means that with some creative thinking the answer is very rarely  ‘no you can’t’. Here we use a combination of our slicers, the xl3membernavigate function, and standard Excel formulae to produce a very effective selector for just this scenario.

A working example of this which connects to the sample bicycle sales local cube which we  ship with the product is available here or you can view the online demo here.

There are a couple of key things to note with this approach:

1) Slicers are typically populated direct from the cube, which makes them very flexible and dynamic. However a less well known aspect is that slicers can be driven from an excel range, and in this case that’s what we’ll be doing.

2) XL3MemberNavigate(). A fairly new formula which allows you to traverse a hierarchy dynamically in a multitude of different ways. Here we just scratch the surface.

To begin with we need to prepare a range of cells in Excel to base the slicer on, in this case the months, and we also need to ensure it’s dynamic and can change with the underlying data structure.  We need to prepare a table of similar structure to the below.

Cell B2 is the selection made by the user in the slicer, which we’ll come back to. The other columns in the table show:

Description:

Logical description of what the row is

Month:

The month available for selection, determined by whatever the user chooses in the slicer, and the Xl3MemberNavigate formula (Insert Formula – Member Navigate) .

Checked Month:

Validation checks on the month to cater for when the first and last available months are selected.

Slicer Display:

what will be displayed in the slicer dialog for user selection.

The first month uses MemberNavigate to get the first available month. This is very straightforward in the MemberNavigate dialog, and will insert a formula in this syntax: XL3MemberNavigate(1,”[Time]”,”[Time].[Month]”,”FirstMember”). Last month is achieved the same way, but using ‘lastmember’.

Previous and Next are again achieved using MemberNavigate, this time the syntax will be:  XL3MemberNavigate(1,”[Time]”,SlicerData!$B$7,”Previous”).

Displayed month is simply what the user has chosen in the slicer.

 Adding the slicer:

Add a slicer from the XLCubed ribbon (or insert slicer menu in 2003). On the selection tab, choose ‘slicer range’ and select C5:D9 on the table shown above. Then set the slicer Type to be buttons. Lastly, on the settings tab, set the slicer to update cell B2 on the SlicerData sheet.

Optionally, you can also name the slicer and choose to show a title bar, as we have in this example.

On inserting the slicer, you’ll need to resize the control itself, and possibly also the size of the buttons if the data member names are long.

You should now have a slicer which enables Prev/Next selections, along with first and last.

Using the slicer in a report

The slicer isn’t currently connecting to anything, or changing filters within a report. To do that, as it’s not directly connected to a hierarchy in the same way as a standard slicer, we need to go via the excel cell which it updates. So any XLCubed grids or formulae need to reference the cell which the slicer outputs its selection to, in this case in this case SlicerData!$B$2.

In our example we’ve just connected one grid, but there can be as many as required. Our example also gives some sales and costing detail for the main product categories. We also use in-grid sparklines to give a feel for the trend, and these can be drilled or sliced and diced in the same way as a standard grid.

The working example can be downloaded here, or a similar version published to XLCubedWeb used online here.

 

 

Sql Server “Denali” CTP3 – first impressions…

Microsoft recently released their third CTP of Denali the upcoming SQL Server release (SQL Server 2011), so here are some initial thoughts now it’s more widely available.:

The first thing to look at is the new Tabular mode for Analysis Services (as opposed to the traditional multi-dimensional mode, which is still available). This is the server version of the VertiPaq engine first seen in the PowerPivot add-in, and moves the engine from being a personal/team tool to an organisation/enterprise level affair.

This means IT are going to get involved (and people can disagree about how they feel about that!), but that report sharing should be easier as data is held centrally. In the past the report contained all the data, which could make for very large workbooks, or you published to SharePoint, which not everyone was set up to do.

Cubes can be queried using MDX, which is great for a front-end vendor like us, and XLCubed works out of the box against the CTP. Existing functionality is working smoothly, and as Microsoft Gold Partners we’re working closely with the releases to utilise all the functionality for the RTM.

We have ported a few existing cubes to the new architecture and one first impression is that removing columns or using perspectives is going to be needed to keep things sensible for end-users, you can quickly end up with hundreds of attributes.

The ability to create hierarchies was something that was often asked for in PowerPivot, and thankfully that’s there now. This should simplify many cubes.

Attribute-tastic

 

The intricacies of MDX put most business users off trying to use it directly, whereas DAX’s similarities with Excel functions means there is more scope to have users create formulae on the fly. Examining how best to expose that to users is something we’ll be spending some time on in the coming months..

Easier distinct counts and the built in date calculations are the obvious candidates, but there are a number of others which we feel we can make more accessible for the majority of users.

It’s certainly an interesting move, and thinking in Tables and Columns instead of the Multidimensional model takes some getting used to, conversely for some people its more natural.

It’ll also be interesting to see how MDX and DAX are integrated. The Tabular server supports both languages for query. Currently using MDX you can use the “With Member” syntax to create members sent to the Tabular server, could you declare a DAX calculation in a similar manner?

Heatmap Tables with Excel – Revisited

We’ve revisited one of our more popular guides Heatmap Tables with Excel as they can be a very effective way of presenting data on a dashboard, and have now updated it for Excel 2010…

This Heatmap Table is designed to show you the revenues and the discounts of a company over the course of one year per product group. The size of a bubble shows the revenue made in a particular month and the bubble color shows the discount rate given. The discount rate has been encoded as a range of green colors, ranging from a light green, for low discounts to a dark green for high discounts. The years and product totals are shown at the right and bottom as an integrated part of the table.

Tufte often talks about the integration of numbers, images and words; I think he’s quite right. A way to achieve this in Excel is to integrate charts into tables, so called graphical tables, a very effective means to show “More Information Per Pixel“.

The heatmap table is based on a regular Excel bubble chart. To integrate a bubble chart into a table the bubbles are positioned in a matrix that has the same row and column layout as our table.

 

 

 

 

 

 

 

 

 

 

 

 

 

In our case we generate a data series table with one column for the X-Series going from 1-12 for January – December and one column for our Y-Series going from 1-8 for our 8 product groups and one column for revenue.

In the sample spreadsheet we’ve setup some simple excel formula to translate data from the classic grid layout:

to the required format:

Now we can insert the bubble chart:

 

To ensure that the charts fit exactly into the table grid we set Min/Max for the X axis to 0.5/12.5 and for the Y axis to 0.5/8.5. Excel would calculate much larger auto scales otherwise. Also set the Major units to 1 so we can use that later to set some grid lines.

 

Now we remove the legend, the X and Y axis, maximize the plot area and align the chart with the Excel table. As the bubbles are initially too large we have to make them smaller. To control the bubble size go to Data Series Options and scale the bubble size to 50%:

 

This already makes a nice bubble table you could use to reproduce the Twitter Charts.

For the grid lines format your table headers and grid lines with light gray grid lines. Resize the plot area, remove the border and re-position the chart so that the chart and the table grid lines align.

To create the heatmap with different colored bubbles we use the fact that by default Excel does not plot data points for #NA values.  For the heatmap we overlay 8 bubble series, one  series per green shade, and show a revenue bubble only if the value fits into the value range that corresponds with a green shade of our color ramp, otherwise we show #NA.

We divide the range MAX(Discount)..0 into 8 groups to define the colours.

The data series columns use the following formula to test if a discount value corresponds with an interval / colour shade:

=IF(AND($E7>I$6-Step,$E7<=I$6),$D7,NA())

The formula returns the revenue, if the discount values is in the interval defined in the column header I$5.

 

 

Now create the eight data series so that the bubble size refers to the eight columns in the data table:

 

And use the Excel chart styles to pick a colour range – make sure you  remove the border from the chart area.

 

 

And you could use the chart styles to quickly switch between different colours – or customise each series to refine the colors.

You can download a starting point for these files here: HeatmapSample.xlsx. Most of the formulae should adapt to data values that you can feed into the data sheets, including data straight from Analysis Services if using XLCubed grids or formulae.

You can see an interactive version of the Heatmap here – we added a link to some cube data, some Slicers for driving the parameters and then published to XLCubedWeb.

 

 

Flexible time-series graphing from a slicer

We are often asked how to drive a chart from a slicer in XLCubed and how to plot days/months for a month or year. Base case this is fairly straightforward, you can set up a grid which is based on the previous ‘x’ months of a slicer selection for example. The difficulty can be where you want to vary the behaviour depending on which level of the hierarchy the user chooses. This is particularly true where the hierarchy contains semesters or quarters.

The example below shows a technique to handle this complexity and display the chart in a way meaningful to the user in each case. The report is based on a slicer that allows the user to switch between showing the graph data based on quarters, months or days.

You can download the Excel spreadsheet that is used in the example here TimeSeriesGraphFromSlicer

This connects to the Adventureworks demo database which ships with Analysis Services.

The diagram below shows the flow of data from each worksheet showing the final result in the sheet Chart.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Workbook Sheet – Chart

This sheet shows the graph based on the data chosen in slicer above it. This switches the graph data between quarters, months and days depending on the slicer selection.

 

Workbook Sheet – GridForChart

This shows the data that will be graphed, depending on the choice made by the slicer selection. In this example it is months July 2001 – June 2002. FY2002 has been selected by the user (in this example Financial Year 2002 runs from July 2001 – June 2002).

Note that cells A10 – A21 contain the value ‘TRUE’ – these cells contain an XL3RowVisible statement as follows:

=XL3RowVisible(B10<>””)

This statement hides rows with no data so that they are not plotted on the graph.

Workbook Sheet – SlicerToMonthDay

This sheet contains the data that is returned by the choice of the slicer in workbook sheet Chart.

User selects a month

The data will be graphed as days. For example, if the user selects July 2002 then the graph will be displayed with each day in July along the x-axis. These are defined in XLCubed as ‘Children of’ the slicer.

User selects a quarter year

The data will be graphed as months in a three month period. For example, the user selects Q1 FY 2003 and the data displayed is for three months from July 2002 – September 2002 as below. These are defined in XLCubed as ‘Descendants of’ the slicer at month. This will be the same when the user picks year, semester or quarter.

User selects a half-year

The data will be graphed as months in a six-month period. For example, the user selects H1 FY 2003. The screenshot below shows the data that will be graphed.

However, it can be seen that the values Q1 FY 2003 and Q2 FY 2003 should not appear on the graph.

Using the Edit Member functionality it is possible to remove these so that they do not appear as points on the graph.

To do this, edit the Date.Fiscal member and click on Advanced tab.

Click on the drop down next to first member – that member set is the resulting data when the user selects H1 FY 2003 and shows the data that is in cells B10 – B43 in sheet SlicerToMonthDay.

 

The screenshot below shows the data that will be subtracted – it is in effect the actual value selected by the user via the slicer alongside the two Fiscal Semester values Q1 FY 2003 and Q2 FY 2003.

 

The GridForChart sheet now shows just the six months that should be graphed. As explained earlier further manipulation using the XL3RowVisible functionality removes blank rows.

 

The screenshot above shows the graph with six months of data for H1 FY 2003 for months July 2002 – December 2002, and the quarters have been dynamically excluded.

The end result is a flexible time selector where the user can choose dates at different levels in the hierarchy, and will always get a meaningful and in-context time series chart.

 

 

Parent-Child Dimensions in Analysis Services – Performance Walkthrough

Parent-child hierarchies are a good fit for many data structures such as accounts or employees, and while they can speed development in some cases, they can also cause performance problems in large cubes.

We often see customers with these type of performance issues, and thought it worth sharing a simple technique for altering the dimension structure to improve query speed.

The problem

Often parent-child hierarchies are created as this is the structure used in the relational source, so they seem a good fit to model the members. In many cases though data is only at the leaf level of the hierachy, meaning parent-child isn’t really needed.

Performance problems occur because no aggregates are created for parent-child dimensions, as detailed in the Analysis Services performance guide:

Parent-child hierarchies

Parent-child hierarchies are hierarchies with a variable number of levels, as determined by a recursive relationship between a child attribute and a parent attribute. Parent-child hierarchies are typically used to represent a financial chart of accounts or an organizational chart. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled. As such, refrain from using parent-child hierarchies that contain large numbers of members at intermediate levels of the hierarchy. Additionally, you should limit the number of parent-child hierarchies in your cube.

If you are in a design scenario with a large parent-child hierarchy (greater than 250,000 members), you may want to consider altering the source schema to re-organize part or all of the hierarchy into a user hierarchy with a fixed number of levels. Once the data has been reorganized into the user hierarchy, you can use the Hide Member If property of each level to hide the redundant or missing members.

 

The performance guide hints at re-organizing the hierarchy to improve perfomance, but doesn’t say how.

The solution

This article will walkthrough the steps needed to change your parent-child hierarchy structure to have real levels, so that aggregations work, and your performance is as good as you expect with normal hierarchies.

This process is known as flattening or normalizing the parent-child hierarchy.

Firstly, let’s look at the data in our relational source.

[spoiler intro=”Code” title=”Sql Create Script”]

CREATE TABLE [dbo].[Products](
 [ProductID] [int] NOT NULL,
 [ParentID] [int] NULL,
 [Name] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED ([ProductID] ASC)
)

GO

insert into Products(ProductID, ParentID, Name) values(1, NULL, 'All')
insert into Products(ProductID, ParentID, Name) values(2, 1, 'Fruit')
insert into Products(ProductID, ParentID, Name) values(3, 2, 'Red')
insert into Products(ProductID, ParentID, Name) values(4, 3, 'Cherry')
insert into Products(ProductID, ParentID, Name) values(5, 3, 'Strawberry')
insert into Products(ProductID, ParentID, Name) values(6, 2, 'Yellow')
insert into Products(ProductID, ParentID, Name) values(7, 6, 'Banana')
insert into Products(ProductID, ParentID, Name) values(8, 6, 'Lemon')
insert into Products(ProductID, ParentID, Name) values(9, 1, 'Meat')
insert into Products(ProductID, ParentID, Name) values(10, 9, 'Beef')
insert into Products(ProductID, ParentID, Name) values(11, 9, 'Pork')

[/spoiler]

Not a large dimension, but enough to demonstrate the technique. As you can see my real products are all at the leaf level.

The strategy is quite simple:

  • Create a view to seperate the members into different levels.
  • Create a new dimension using these real levels.
  • Configure the dimension to appear like the original parent-child dimension, but with the performance of a normal dimension.

Create the view

We want to create a denormalised view of the data. To do this we join the Product to itself once for each level. This does mean we need to know the maximum depth of the hierarchy, but often this is fixed, and we’ll build in some extra levels for safety.

The tricks here are:

  • Use coalesce() so that we always get the lowest level ID below the leaves, never a NULL. This allows us to join to the fact table at the bottom level of our hierarchy.
  • Leave Name columns null below the leaves, this will allow us to stop the hierarchy at the correct leaf level in each part of the hierarchy.

[spoiler intro=”Code” title=”Sql View Script”]

create view dbo.ProductsFlattened

as

select    P1.ProductID as Level1ID,
 P1.Name as Level1Name,
 coalesce(P2.ProductID, P1.ProductID) as Level2ID,
 P2.Name as Level2Name,
 coalesce(P3.ProductID, P2.ProductID, P1.ProductID) as Level3ID,
 P3.Name as Level3Name,
 coalesce(P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level4ID,
 P4.Name as Level4Name,
 coalesce(P5.ProductID, P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level5ID,
 P5.Name as Level5Name

from    dbo.Products P0
left join    dbo.Products P1
 on        P0.ProductID = P1.ParentID
left join    dbo.Products P2
 on        P1.ProductID = P2.ParentID
left join    dbo.Products P3
 on        P2.ProductID = P3.ParentID
left join    dbo.Products P4
 on        P3.ProductID = P4.ParentID
left join    dbo.Products P5
 on        P4.ProductID = P5.ParentID

where P0.ParentID is null

[/spoiler]

Running this we get:

Obviously we can update this view to create more levels as required, but 5 are enough for now.

The Dimension

Next we go to BIDS, and add the view to our Data Source View, and then add a new Dimension based on the view.

The key steps to creating the dimension correctly are:

  • Set the key attribute to Level5ID, and the name to Level5Name.
  • Create an attribute for each Level ID, and on each set the Name Column appropriately.
  • Create a hierarchy using these attributes in order.
  • On each attribute set AttributeHierarchyVisible to False.
  • On each level of the hierarchy set HideMemberIf to NoName.
  • Set up the Attribute Relationships between the levels.

You should end up with the following:

Dimension Structure

 

Attribute Relationships

 

If you browse the dimension you’ll see that it never goes as far as level 5, even though it exists. This is becuase we set up the member hiding option, and returned NULLs in our view.

Conclusion

And that’s it done, you can now join to your fact tables at the lowest level, build your cube as normal and get the performance benefits of aggregation!

See also

A tool to achieve the same result is available from Codeplex, we’ve not personally tried it but may well be a timesaver. This works in a similar way to the example above, but it’s often useful to understand how something works, even if you choose to automate it.

Microsoft Gold Partner Renewal, & resulting questions answered

We’ve just completed the process of renewing our Gold partner status with Microsoft. Among other things, this ensures we have early access to upcoming software through the CTP and beta programs, as has been the case with office 2010 over the last year.

The Gold level now has a requirement for completion of an independently run Customer Satisfaction survey, and thanks very much to everyone who completed this. The scoring and comments are much appreciated, and it was good to see the consistently high scores. We’re of course reviewing the areas where we didn’t score perfectly as we strive to further improve the service we offer.

The survey also highlighted a number of questions which appeared a few times:

1) Relational Database support

Our prime focus is on cube based reporting and analytics, but we do also support querying of relational databases in both Excel and on the Web, and have a number of customers using the product purely for relational reporting.

We extend the native Microsoft functionality in excel, and add support for this in the Web product. The connection string and query string can both be formula driven so you can construct parameter driven reports with ease.

– Search for ‘relational database access’ in the help file for an overview.

2) Writeback & Planning applications

XLCubed supports both grid and formula based writeback against Analysis Services, in Excel and on the Web. As such it lends itself well to planning and budgeting applications, and it’s an arena in which we have a lot of experience, from the straightforward through to the highly complex.

3) Documentation and User Guides

Documentation is now online and regularly being updated, you can access it here: http://www.xlcubed.com/help

4) YouTube videos

We had two streams of comment here, broadly summarised as:

a) They’re really useful – thanks!

b) Youtube access is blocked by our corporate internet policy!

If your access is blocked, you can download the videos as mp4 from

http://www.xlcubed.com/downloads/xlcubedv6_youtube.zip