Tag Archives: OLAP

Solve order shenanigans

Today I’m going to blog about a problem we recently solved in a client’s cube, an error in the Mdx script that’s very easy to make if you aren’t careful.

We’ll run a simple example in AdventureWorks (what else?) to demonstrate the issue.

The client had already added a calculation to their cube to show year-on-year growth. The formula is:

Create Member CurrentCube.[Measures].[Delta to PrevYear] as
(
    ([Measures].[Internet Sales Amount])
    -
    ([Measures].[Internet Sales Amount],
        ParallelPeriod(
            [Date].[Calendar].[Calendar Year],
            1,
            [Date].[Calendar].CurrentMember
        )
    )
)
/
    ([Measures].[Internet Sales Amount],
        ParallelPeriod(
            [Date].[Calendar].[Calendar Year],
            1,
            [Date].[Calendar].CurrentMember
        )
    )
, Format_String = "0.00%";

(some error checking removed for clarity)

This screenshot shows a couple of simple XLCubed Grids showing the real value, and below the percentage change. I have added in an Excel calculation to show the results are as expected.

Later during the cube development, the client added a calculated member in their Product dimension, one that gives a total excluding one of the product categories.

To replicate this I’ll add a calculation for “All Ex Bikes”:

Create Member 
CurrentCube.[Product].[Product Model Categories].[All Products].[All Ex Bikes]
as
(
    ([Product].[Product Model Categories].[All Products])
    -
    ([Product].[Product Model Categories].[Category].&[1])
);

And if we run the report again we get the following.

Notice the cell I’ve highlighted. The “All Ex Bikes” calculation works fine on the normal measure, but it gives totally the wrong number for the percentage calculation. What’s going on?

The problem is that in the cell highlighted Analysis Services has two calculations to think about when working out the result.

  • Compare this year to last year
  • Get the “Grand Total”, and subtract “Bikes”

As the number returned is 1.85% we can see that Analysis Services has chosen the second option, “Grand Total” – “Bikes”.

What we really want is for the calculation to be done by getting the subtotal, and then doing the percentage change based on that.

Fortunately the fix was a simple one. Analysis Services will run the calculations in the order they are found in the Mdx Script, so to fix the issue we simply moved the new “All Ex Bikes” definition up above the percentage calculation.

Now the number returned matches our expectations.

Pass/Solve Order can be a complex topic, so you may need to be careful.

In this case the number is totally wrong, so it was easy to spot, but some bugs will be much more subtle, so watch out!

Ranking, Sorting and Filtering

Once we have returned cube members into a grid report we often need to exclude or change the order of the result set to provide more meaningful information. MDX (Multidimensional Expressions) language includes some very useful operators to provide filtering (FILTER), sorting (ORDER) and ranking (TOPCOUNT/BOTTOMCOUNT) of dimension members. These can be quite overwhelming even for power users of XLCubed.  So, in V6, we have introduced a new feature “Advanced Member Selections” to provide easy access to this powerful part of Microsoft Analysis Services.

Using this new functionality we can nest and combine these operations to answer complex business questions (for simpler operations you can right-click on a member in the grid and use the “Apply” menu to perform simple ranking, filters and sorting).

Filtering

So let’s go through a simple filtering example.  Say, for example, that we want to find the products at Product Key level that sold more than 25 units in 2003, Quarter 1 and show the sales figures for those subcategories during 2003 and its quarters.

  1. Start by clicking the Grid ribbon item (or the XLCubed > Design Grid menu item in Excel 2003 and below), and selecting the Internet Sales cube file
  2. Drag Calendar Period to Columns and Product to Rows. You can also drag any other hierarchies to Headers. In the example image below, Measures and Customer have been added there.

  1. Click on the Product hierarchy so that its details appear in the bottom-right panel.
  2. Drag the Product key level over to the right of the dialog. You can switch between the members view and levels view by clicking on the Show Levels icon ().
  3. Click the Advanced tab to show the advanced selection pane:

  1. Click the Members drop down and choose Filter result:


  1. Click the Calendar Period edit control in the grid to change its selection to the desired member (2003, Quarter 1):

  1. Select the This measure radio button, and select Order Quantity as the desired measure.
  2. Change the Operation to >, and type 25 in the edit field on the right:

  1. Click OK. The new filter is displayed in the advanced selections tab:

  1. Click OK again to run the Report – the Grid shows the members that fit our criteria:

 

So we can see the results, filtering by 2003 Q1, but displaying the values for All Time (or any other period we wish to use). We could have also used the Range selector:    to drive the period selecting from an Excel Range and our grid would automatically refresh whenever the driving value changes.

Ranking

Now let’s add a ranking to find the bottom 8 selling products at the Product Key level that have sold more than 25 units inQ1:

  1. Display the Product Hierarchy Editor dialog
  2. Click the Rank result icon () on the advanced selections tab to display the Edit Ranking dialog
  3. Select the Bottom radio button, and type 8 into the edit field
  4. Select 2003, Quarter 1 for the Calendar Period hierarchy in the grid below:

We now have the filter, following by the ranking:

 

Run the Grid: only the lowest 8 members are returned

 

Sorting

Now let’s sort the report on a different dimension – for example, descending order of the Q1 sales.

  1. Display the Hierarchy Editor for the Product hierarchy by double-clicking on the Product label in the Grid
  2. If it’s not already visible, select the Advanced tab
  3. Click the Sort result toolbar button ()
  4. Change the Calendar Period selection to 2003, Quarter 1:

  1. Click the Sort Descending (9-1) radio button
  2. Click OK. The new sort is displayed in the advanced selections tab
Click OK again to run the Report

 

Joining Results

It’s also possible to join different results together: combining both sets (UNION), excluding members (EXCEPT) and returning common members (INTERSECT).

So we could also add the top 10 products  along side the bottom 8 products to the grid. Begin by adding another member selection using the “Add Member List” tool-bar button:

As before, we select the list of members to rank (in this case the Product Key level) and then select the operation we want to perform, a Top 10:

There are various options to decide how to combine the lists, we’ll stick with Add:

 

 

And we get both results combined:


So the “Advanced Member Selections” feature provides lots of the power of Analysis Services in a simplified way  – to try this feature for yourself you can begin by downloading XLCubed.

Sql Server “Denali” CTP3 – first impressions…

Microsoft recently released their third CTP of Denali the upcoming SQL Server release (SQL Server 2011), so here are some initial thoughts now it’s more widely available.:

The first thing to look at is the new Tabular mode for Analysis Services (as opposed to the traditional multi-dimensional mode, which is still available). This is the server version of the VertiPaq engine first seen in the PowerPivot add-in, and moves the engine from being a personal/team tool to an organisation/enterprise level affair.

This means IT are going to get involved (and people can disagree about how they feel about that!), but that report sharing should be easier as data is held centrally. In the past the report contained all the data, which could make for very large workbooks, or you published to SharePoint, which not everyone was set up to do.

Cubes can be queried using MDX, which is great for a front-end vendor like us, and XLCubed works out of the box against the CTP. Existing functionality is working smoothly, and as Microsoft Gold Partners we’re working closely with the releases to utilise all the functionality for the RTM.

We have ported a few existing cubes to the new architecture and one first impression is that removing columns or using perspectives is going to be needed to keep things sensible for end-users, you can quickly end up with hundreds of attributes.

The ability to create hierarchies was something that was often asked for in PowerPivot, and thankfully that’s there now. This should simplify many cubes.

Attribute-tastic

 

The intricacies of MDX put most business users off trying to use it directly, whereas DAX’s similarities with Excel functions means there is more scope to have users create formulae on the fly. Examining how best to expose that to users is something we’ll be spending some time on in the coming months..

Easier distinct counts and the built in date calculations are the obvious candidates, but there are a number of others which we feel we can make more accessible for the majority of users.

It’s certainly an interesting move, and thinking in Tables and Columns instead of the Multidimensional model takes some getting used to, conversely for some people its more natural.

It’ll also be interesting to see how MDX and DAX are integrated. The Tabular server supports both languages for query. Currently using MDX you can use the “With Member” syntax to create members sent to the Tabular server, could you declare a DAX calculation in a similar manner?

Parent-Child Dimensions in Analysis Services – Performance Walkthrough

Parent-child hierarchies are a good fit for many data structures such as accounts or employees, and while they can speed development in some cases, they can also cause performance problems in large cubes.

We often see customers with these type of performance issues, and thought it worth sharing a simple technique for altering the dimension structure to improve query speed.

The problem

Often parent-child hierarchies are created as this is the structure used in the relational source, so they seem a good fit to model the members. In many cases though data is only at the leaf level of the hierachy, meaning parent-child isn’t really needed.

Performance problems occur because no aggregates are created for parent-child dimensions, as detailed in the Analysis Services performance guide:

Parent-child hierarchies

Parent-child hierarchies are hierarchies with a variable number of levels, as determined by a recursive relationship between a child attribute and a parent attribute. Parent-child hierarchies are typically used to represent a financial chart of accounts or an organizational chart. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled. As such, refrain from using parent-child hierarchies that contain large numbers of members at intermediate levels of the hierarchy. Additionally, you should limit the number of parent-child hierarchies in your cube.

If you are in a design scenario with a large parent-child hierarchy (greater than 250,000 members), you may want to consider altering the source schema to re-organize part or all of the hierarchy into a user hierarchy with a fixed number of levels. Once the data has been reorganized into the user hierarchy, you can use the Hide Member If property of each level to hide the redundant or missing members.

 

The performance guide hints at re-organizing the hierarchy to improve perfomance, but doesn’t say how.

The solution

This article will walkthrough the steps needed to change your parent-child hierarchy structure to have real levels, so that aggregations work, and your performance is as good as you expect with normal hierarchies.

This process is known as flattening or normalizing the parent-child hierarchy.

Firstly, let’s look at the data in our relational source.

Code: Sql Create ScriptSelectShow

Not a large dimension, but enough to demonstrate the technique. As you can see my real products are all at the leaf level.

The strategy is quite simple:

  • Create a view to seperate the members into different levels.
  • Create a new dimension using these real levels.
  • Configure the dimension to appear like the original parent-child dimension, but with the performance of a normal dimension.

Create the view

We want to create a denormalised view of the data. To do this we join the Product to itself once for each level. This does mean we need to know the maximum depth of the hierarchy, but often this is fixed, and we’ll build in some extra levels for safety.

The tricks here are:

  • Use coalesce() so that we always get the lowest level ID below the leaves, never a NULL. This allows us to join to the fact table at the bottom level of our hierarchy.
  • Leave Name columns null below the leaves, this will allow us to stop the hierarchy at the correct leaf level in each part of the hierarchy.

Code: Sql View ScriptSelectShow

Running this we get:

Obviously we can update this view to create more levels as required, but 5 are enough for now.

The Dimension

Next we go to BIDS, and add the view to our Data Source View, and then add a new Dimension based on the view.

The key steps to creating the dimension correctly are:

  • Set the key attribute to Level5ID, and the name to Level5Name.
  • Create an attribute for each Level ID, and on each set the Name Column appropriately.
  • Create a hierarchy using these attributes in order.
  • On each attribute set AttributeHierarchyVisible to False.
  • On each level of the hierarchy set HideMemberIf to NoName.
  • Set up the Attribute Relationships between the levels.

You should end up with the following:

Dimension Structure

 

Attribute Relationships

 

If you browse the dimension you’ll see that it never goes as far as level 5, even though it exists. This is becuase we set up the member hiding option, and returned NULLs in our view.

Conclusion

And that’s it done, you can now join to your fact tables at the lowest level, build your cube as normal and get the performance benefits of aggregation!

See also

A tool to achieve the same result is available from Codeplex, we’ve not personally tried it but may well be a timesaver. This works in a similar way to the example above, but it’s often useful to understand how something works, even if you choose to automate it.

So, what’s an OLAP Cube, anyway?

As "An Excel User in a Cubed Kingdom" I’m starting my exploration of this new found land with a simple question: what is an OLAP cube? In plain English, please…

I like this simple, non-technical definition: a cube is a set of predefined answers. It’s up to you to select the right questions.

OK, let’s detail this.

Imagine that you have a very large database with the usual business data: orders, customers, sales representatives… Now you want to know how much a customer category ordered over the last year. You query the database and you get the answer. Then you want monthly sales. Query it again. Dig a little deeper to see what products that category ordered. Query it once more.

What is happening behind the curtain? Each time you enter a new query the system looks at each transaction (or a subset) and performs the necessary calculations to answer your query. You’ll get your answers, but it will be painfully slow: depending on your query and the database size, it may take hours. It is not an option.

But you don’t really need to see each individual order, do you? If you only need to know monthly sales, why should your system go through each transaction? If you pre-aggregate that data, you’ll get your answers much, much, faster, because there aren’t five million records, just 100.000. You’ll be able to actually work, instead of staring at your monitor, waiting for an answer.

This is what a cube does. It provides faster answers by eliminating the unnecessary detail for the task at hand. You shouldn’t look at a cube as an unique, condensed version of the database. While you have a virtually infinite number of questions that the database can answer, a cube focus on providing answers to a small set of questions. That’s why you can have different cubes (marketing, fin, sales), all of them getting data from the same source. They all answer different sets of questions.

When designing a cube (it’s your job, not IT’s), resist the temptation of a one-size-fits-all cube. Clearly define a coherent set of questions for your fundamental business needs and make sure they are answered once the cube becomes available.

If you are exploring your data you will not want to wait one hour each time you make a change. On the other end, a fast cube with no data to explore is useless. There is  a fine balance between maximum flexibility and maximum performance.

An OLAP cube not only ensures that you retrieve the right data from the database but also allows you to explore it efficiently. Two good reasons to add OLAP cubes to your toolbox.

Next time we’ll see how plain English can describe the structure of a basic OLAP cube.