Solve order shenanigans

Today I’m going to blog about a problem we recently solved in a client’s cube, an error in the Mdx script that’s very easy to make if you aren’t careful.

We’ll run a simple example in AdventureWorks (what else?) to demonstrate the issue.

The client had already added a calculation to their cube to show year-on-year growth. The formula is:

Create Member CurrentCube.[Measures].[Delta to PrevYear] as
(
    ([Measures].[Internet Sales Amount])
    -
    ([Measures].[Internet Sales Amount],
        ParallelPeriod(
            [Date].[Calendar].[Calendar Year],
            1,
            [Date].[Calendar].CurrentMember
        )
    )
)
/
    ([Measures].[Internet Sales Amount],
        ParallelPeriod(
            [Date].[Calendar].[Calendar Year],
            1,
            [Date].[Calendar].CurrentMember
        )
    )
, Format_String = "0.00%";

(some error checking removed for clarity)

This screenshot shows a couple of simple XLCubed Grids showing the real value, and below the percentage change. I have added in an Excel calculation to show the results are as expected.

Later during the cube development, the client added a calculated member in their Product dimension, one that gives a total excluding one of the product categories.

To replicate this I’ll add a calculation for “All Ex Bikes”:

Create Member 
CurrentCube.[Product].[Product Model Categories].[All Products].[All Ex Bikes]
as
(
    ([Product].[Product Model Categories].[All Products])
    -
    ([Product].[Product Model Categories].[Category].&[1])
);

And if we run the report again we get the following.

Notice the cell I’ve highlighted. The “All Ex Bikes” calculation works fine on the normal measure, but it gives totally the wrong number for the percentage calculation. What’s going on?

The problem is that in the cell highlighted Analysis Services has two calculations to think about when working out the result.

  • Compare this year to last year
  • Get the “Grand Total”, and subtract “Bikes”

As the number returned is 1.85% we can see that Analysis Services has chosen the second option, “Grand Total” – “Bikes”.

What we really want is for the calculation to be done by getting the subtotal, and then doing the percentage change based on that.

Fortunately the fix was a simple one. Analysis Services will run the calculations in the order they are found in the Mdx Script, so to fix the issue we simply moved the new “All Ex Bikes” definition up above the percentage calculation.

Now the number returned matches our expectations.

Pass/Solve Order can be a complex topic, so you may need to be careful.

In this case the number is totally wrong, so it was easy to spot, but some bugs will be much more subtle, so watch out!

Ranking, Sorting and Filtering

Once we have returned cube members into a grid report we often need to exclude or change the order of the result set to provide more meaningful information. MDX (Multidimensional Expressions) language includes some very useful operators to provide filtering (FILTER), sorting (ORDER) and ranking (TOPCOUNT/BOTTOMCOUNT) of dimension members. These can be quite overwhelming even for power users of XLCubed.  So, in V6, we have introduced a new feature “Advanced Member Selections” to provide easy access to this powerful part of Microsoft Analysis Services.

Using this new functionality we can nest and combine these operations to answer complex business questions (for simpler operations you can right-click on a member in the grid and use the “Apply” menu to perform simple ranking, filters and sorting).

Filtering

So let’s go through a simple filtering example.  Say, for example, that we want to find the products at Product Key level that sold more than 25 units in 2003, Quarter 1 and show the sales figures for those subcategories during 2003 and its quarters.

  1. Start by clicking the Grid ribbon item (or the XLCubed > Design Grid menu item in Excel 2003 and below), and selecting the Internet Sales cube file
  2. Drag Calendar Period to Columns and Product to Rows. You can also drag any other hierarchies to Headers. In the example image below, Measures and Customer have been added there.

  1. Click on the Product hierarchy so that its details appear in the bottom-right panel.
  2. Drag the Product key level over to the right of the dialog. You can switch between the members view and levels view by clicking on the Show Levels icon ().
  3. Click the Advanced tab to show the advanced selection pane:

  1. Click the Members drop down and choose Filter result:


  1. Click the Calendar Period edit control in the grid to change its selection to the desired member (2003, Quarter 1):

  1. Select the This measure radio button, and select Order Quantity as the desired measure.
  2. Change the Operation to >, and type 25 in the edit field on the right:

  1. Click OK. The new filter is displayed in the advanced selections tab:

  1. Click OK again to run the Report – the Grid shows the members that fit our criteria:

 

So we can see the results, filtering by 2003 Q1, but displaying the values for All Time (or any other period we wish to use). We could have also used the Range selector:    to drive the period selecting from an Excel Range and our grid would automatically refresh whenever the driving value changes.

Ranking

Now let’s add a ranking to find the bottom 8 selling products at the Product Key level that have sold more than 25 units inQ1:

  1. Display the Product Hierarchy Editor dialog
  2. Click the Rank result icon () on the advanced selections tab to display the Edit Ranking dialog
  3. Select the Bottom radio button, and type 8 into the edit field
  4. Select 2003, Quarter 1 for the Calendar Period hierarchy in the grid below:

We now have the filter, following by the ranking:

 

Run the Grid: only the lowest 8 members are returned

 

Sorting

Now let’s sort the report on a different dimension – for example, descending order of the Q1 sales.

  1. Display the Hierarchy Editor for the Product hierarchy by double-clicking on the Product label in the Grid
  2. If it’s not already visible, select the Advanced tab
  3. Click the Sort result toolbar button ()
  4. Change the Calendar Period selection to 2003, Quarter 1:

  1. Click the Sort Descending (9-1) radio button
  2. Click OK. The new sort is displayed in the advanced selections tab
Click OK again to run the Report

 

Joining Results

It’s also possible to join different results together: combining both sets (UNION), excluding members (EXCEPT) and returning common members (INTERSECT).

So we could also add the top 10 products  along side the bottom 8 products to the grid. Begin by adding another member selection using the “Add Member List” tool-bar button:

As before, we select the list of members to rank (in this case the Product Key level) and then select the operation we want to perform, a Top 10:

There are various options to decide how to combine the lists, we’ll stick with Add:

 

 

And we get both results combined:


So the “Advanced Member Selections” feature provides lots of the power of Analysis Services in a simplified way  – to try this feature for yourself you can begin by downloading XLCubed.

Sql Server “Denali” CTP3 – first impressions…

Microsoft recently released their third CTP of Denali the upcoming SQL Server release (SQL Server 2011), so here are some initial thoughts now it’s more widely available.:

The first thing to look at is the new Tabular mode for Analysis Services (as opposed to the traditional multi-dimensional mode, which is still available). This is the server version of the VertiPaq engine first seen in the PowerPivot add-in, and moves the engine from being a personal/team tool to an organisation/enterprise level affair.

This means IT are going to get involved (and people can disagree about how they feel about that!), but that report sharing should be easier as data is held centrally. In the past the report contained all the data, which could make for very large workbooks, or you published to SharePoint, which not everyone was set up to do.

Cubes can be queried using MDX, which is great for a front-end vendor like us, and XLCubed works out of the box against the CTP. Existing functionality is working smoothly, and as Microsoft Gold Partners we’re working closely with the releases to utilise all the functionality for the RTM.

We have ported a few existing cubes to the new architecture and one first impression is that removing columns or using perspectives is going to be needed to keep things sensible for end-users, you can quickly end up with hundreds of attributes.

The ability to create hierarchies was something that was often asked for in PowerPivot, and thankfully that’s there now. This should simplify many cubes.

Attribute-tastic

 

The intricacies of MDX put most business users off trying to use it directly, whereas DAX’s similarities with Excel functions means there is more scope to have users create formulae on the fly. Examining how best to expose that to users is something we’ll be spending some time on in the coming months..

Easier distinct counts and the built in date calculations are the obvious candidates, but there are a number of others which we feel we can make more accessible for the majority of users.

It’s certainly an interesting move, and thinking in Tables and Columns instead of the Multidimensional model takes some getting used to, conversely for some people its more natural.

It’ll also be interesting to see how MDX and DAX are integrated. The Tabular server supports both languages for query. Currently using MDX you can use the “With Member” syntax to create members sent to the Tabular server, could you declare a DAX calculation in a similar manner?

Parent-Child Dimensions in Analysis Services – Performance Walkthrough

Parent-child hierarchies are a good fit for many data structures such as accounts or employees, and while they can speed development in some cases, they can also cause performance problems in large cubes.

We often see customers with these type of performance issues, and thought it worth sharing a simple technique for altering the dimension structure to improve query speed.

The problem

Often parent-child hierarchies are created as this is the structure used in the relational source, so they seem a good fit to model the members. In many cases though data is only at the leaf level of the hierachy, meaning parent-child isn’t really needed.

Performance problems occur because no aggregates are created for parent-child dimensions, as detailed in the Analysis Services performance guide:

Parent-child hierarchies

Parent-child hierarchies are hierarchies with a variable number of levels, as determined by a recursive relationship between a child attribute and a parent attribute. Parent-child hierarchies are typically used to represent a financial chart of accounts or an organizational chart. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled. As such, refrain from using parent-child hierarchies that contain large numbers of members at intermediate levels of the hierarchy. Additionally, you should limit the number of parent-child hierarchies in your cube.

If you are in a design scenario with a large parent-child hierarchy (greater than 250,000 members), you may want to consider altering the source schema to re-organize part or all of the hierarchy into a user hierarchy with a fixed number of levels. Once the data has been reorganized into the user hierarchy, you can use the Hide Member If property of each level to hide the redundant or missing members.

 

The performance guide hints at re-organizing the hierarchy to improve perfomance, but doesn’t say how.

The solution

This article will walkthrough the steps needed to change your parent-child hierarchy structure to have real levels, so that aggregations work, and your performance is as good as you expect with normal hierarchies.

This process is known as flattening or normalizing the parent-child hierarchy.

Firstly, let’s look at the data in our relational source.

[spoiler intro=”Code” title=”Sql Create Script”]

CREATE TABLE [dbo].[Products](
 [ProductID] [int] NOT NULL,
 [ParentID] [int] NULL,
 [Name] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED ([ProductID] ASC)
)

GO

insert into Products(ProductID, ParentID, Name) values(1, NULL, 'All')
insert into Products(ProductID, ParentID, Name) values(2, 1, 'Fruit')
insert into Products(ProductID, ParentID, Name) values(3, 2, 'Red')
insert into Products(ProductID, ParentID, Name) values(4, 3, 'Cherry')
insert into Products(ProductID, ParentID, Name) values(5, 3, 'Strawberry')
insert into Products(ProductID, ParentID, Name) values(6, 2, 'Yellow')
insert into Products(ProductID, ParentID, Name) values(7, 6, 'Banana')
insert into Products(ProductID, ParentID, Name) values(8, 6, 'Lemon')
insert into Products(ProductID, ParentID, Name) values(9, 1, 'Meat')
insert into Products(ProductID, ParentID, Name) values(10, 9, 'Beef')
insert into Products(ProductID, ParentID, Name) values(11, 9, 'Pork')

[/spoiler]

Not a large dimension, but enough to demonstrate the technique. As you can see my real products are all at the leaf level.

The strategy is quite simple:

  • Create a view to seperate the members into different levels.
  • Create a new dimension using these real levels.
  • Configure the dimension to appear like the original parent-child dimension, but with the performance of a normal dimension.

Create the view

We want to create a denormalised view of the data. To do this we join the Product to itself once for each level. This does mean we need to know the maximum depth of the hierarchy, but often this is fixed, and we’ll build in some extra levels for safety.

The tricks here are:

  • Use coalesce() so that we always get the lowest level ID below the leaves, never a NULL. This allows us to join to the fact table at the bottom level of our hierarchy.
  • Leave Name columns null below the leaves, this will allow us to stop the hierarchy at the correct leaf level in each part of the hierarchy.

[spoiler intro=”Code” title=”Sql View Script”]

create view dbo.ProductsFlattened

as

select    P1.ProductID as Level1ID,
 P1.Name as Level1Name,
 coalesce(P2.ProductID, P1.ProductID) as Level2ID,
 P2.Name as Level2Name,
 coalesce(P3.ProductID, P2.ProductID, P1.ProductID) as Level3ID,
 P3.Name as Level3Name,
 coalesce(P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level4ID,
 P4.Name as Level4Name,
 coalesce(P5.ProductID, P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level5ID,
 P5.Name as Level5Name

from    dbo.Products P0
left join    dbo.Products P1
 on        P0.ProductID = P1.ParentID
left join    dbo.Products P2
 on        P1.ProductID = P2.ParentID
left join    dbo.Products P3
 on        P2.ProductID = P3.ParentID
left join    dbo.Products P4
 on        P3.ProductID = P4.ParentID
left join    dbo.Products P5
 on        P4.ProductID = P5.ParentID

where P0.ParentID is null

[/spoiler]

Running this we get:

Obviously we can update this view to create more levels as required, but 5 are enough for now.

The Dimension

Next we go to BIDS, and add the view to our Data Source View, and then add a new Dimension based on the view.

The key steps to creating the dimension correctly are:

  • Set the key attribute to Level5ID, and the name to Level5Name.
  • Create an attribute for each Level ID, and on each set the Name Column appropriately.
  • Create a hierarchy using these attributes in order.
  • On each attribute set AttributeHierarchyVisible to False.
  • On each level of the hierarchy set HideMemberIf to NoName.
  • Set up the Attribute Relationships between the levels.

You should end up with the following:

Dimension Structure

 

Attribute Relationships

 

If you browse the dimension you’ll see that it never goes as far as level 5, even though it exists. This is becuase we set up the member hiding option, and returned NULLs in our view.

Conclusion

And that’s it done, you can now join to your fact tables at the lowest level, build your cube as normal and get the performance benefits of aggregation!

See also

A tool to achieve the same result is available from Codeplex, we’ve not personally tried it but may well be a timesaver. This works in a similar way to the example above, but it’s often useful to understand how something works, even if you choose to automate it.

Cube Design – meeting the business needs

Following on from our previous blog post on a couple of the common cube performance issues we’ve seen this last month, I thought I’d mention some of the non-technical issues we see quite often. In one case, once we’d made a few teaks and sorted out the cube performance issues we had to ask – Is the cube doing what it needs to? (Of course we did ask this first but the priority was sorting out the current cube performance!) Does it meet the business requirement? There’s no point in having the most complex cube that uses all the greatest features if it can’t answer the users queries.

In reports, we’ve seen examples where clients have nested four or five attributes to build up the effects of a hierarchy or run huge queries then vlookups on them to get the data they need, or bring back 12 columns of data and manually work out year to date, or not have any hierarchies that reflected commonly used groupings of members, or not have member names formatted in the way the business needs. To us this just isn’t right.

The users might not seem to care too much if they don’t know how the cube could work or if it runs fast enough to bring back huge result sets they can manipulate themselves – but doesn’t that negate the point of having a cube and your investment in it? Consumers of the cube should have fast, timely, accurate and importantly appropriate data made available to them in a manner that makes sense.

Cube design and build is about understanding the business and users needs and then building the cube and associated processes, that’s before even starting to build the reports and conveying the information using good data visualisation practices.

All too often we’re seeing a drive to use the latest tech, the flashiest widgets, cool looking 3D and shading effects on reports through to cubes and databases with every conceivable hierarchy or type of measure thought possible but not bearing much resemblance to what the users need to see.

I won’t hide the fact that we’re very proud of our skills and experience in ensuring our clients get not just a technically excellent system but also one that fits their needs. If you want to talk to one of the team about how they can help, you can find our contact details here.

Common Analysis Services Performance Issues

A quick blog post from the Services team here at XLCubed on some performance problems with SSAS that we’ve seen again recently. With the processing power and memory available it’s pretty easy to build a fast cube – both for query performance and processing time. It is also easy to be lax in cube design, ignore the warnings and best practice guidelines, and end up with a cube that’s looks concise, is neat and clever but performs terribly for end users.

We’ve come across a couple of examples of this at client sites in the last month, and there are some common issues that always seem to jump out – rectifying these normally has a very positive impact. The three most common culprits we see are:

Parent-Child dimensions – Parent-Child dimensions are nice and easy to build and use. However, as you can’t build aggregations that include a parent-child dimension it can make for a badly performing cube! Try to flatten dimensions out and evaluate exactly why a parent-child dimension is required and being used. They are not the only option..

Unary operators, Custom-roll ups – we’ve seen cases where these have been included in every dimension in a cube by default. If there isn’t a need for them – leave them out! If you can get around using a custom rollup or unary operator by some simple work in the ETL process it may be better to do that first.

If your query performance is bad – try removing all unary operators and custom rollups then re-test the cube. How’s the performance now? It should be significantly faster – evaluate and review the need for the unary operators and custom rollups and see if the same effect can be achieved differently (e.g. in the ETL layer)

Cache vs. Non-Cache Data – Basically is the cube recalculating and re-querying numbers over and over again or can it re-use results? Use profiler to check for cache or non-cache data when your queries are running. So many times we’ve seen all queries not using the cache because AS hasn’t been given enough available memory or volatile operators such as now() have been used in mdx calcs.

Resolving the issues above had a massive impact – reports taking up to 3 minutes to run were down to a few seconds, users could begin to use the application properly for the first time, however fixing the performance may be only part of the task. The cube of course needs to have been designed to meet the business requirements, but that’s another blog..

So, what’s an OLAP Cube, anyway?

As "An Excel User in a Cubed Kingdom" I’m starting my exploration of this new found land with a simple question: what is an OLAP cube? In plain English, please…

I like this simple, non-technical definition: a cube is a set of predefined answers. It’s up to you to select the right questions.

OK, let’s detail this.

Imagine that you have a very large database with the usual business data: orders, customers, sales representatives… Now you want to know how much a customer category ordered over the last year. You query the database and you get the answer. Then you want monthly sales. Query it again. Dig a little deeper to see what products that category ordered. Query it once more.

What is happening behind the curtain? Each time you enter a new query the system looks at each transaction (or a subset) and performs the necessary calculations to answer your query. You’ll get your answers, but it will be painfully slow: depending on your query and the database size, it may take hours. It is not an option.

But you don’t really need to see each individual order, do you? If you only need to know monthly sales, why should your system go through each transaction? If you pre-aggregate that data, you’ll get your answers much, much, faster, because there aren’t five million records, just 100.000. You’ll be able to actually work, instead of staring at your monitor, waiting for an answer.

This is what a cube does. It provides faster answers by eliminating the unnecessary detail for the task at hand. You shouldn’t look at a cube as an unique, condensed version of the database. While you have a virtually infinite number of questions that the database can answer, a cube focus on providing answers to a small set of questions. That’s why you can have different cubes (marketing, fin, sales), all of them getting data from the same source. They all answer different sets of questions.

When designing a cube (it’s your job, not IT’s), resist the temptation of a one-size-fits-all cube. Clearly define a coherent set of questions for your fundamental business needs and make sure they are answered once the cube becomes available.

If you are exploring your data you will not want to wait one hour each time you make a change. On the other end, a fast cube with no data to explore is useless. There is  a fine balance between maximum flexibility and maximum performance.

An OLAP cube not only ensures that you retrieve the right data from the database but also allows you to explore it efficiently. Two good reasons to add OLAP cubes to your toolbox.

Next time we’ll see how plain English can describe the structure of a basic OLAP cube.