DAX Performance tips– lessons from the field

XLCubed has supported a drag/drop interface for creating reports against Tabular Analysis Services since the first release of the new engine. It lets users easily create reports which run DAX queries on the cube, and we’ve often seen very good performance at customers when MDX against Tabular was a cause of long running reports.

So when we were approached at SQL Pass in Seattle by some attendees who had a SSAS Tabular performance issue we were optimistic we’d be able to help.

In this case the business wanted to retrieve thousands of rows from the cube at the transactional level, and the first approach had been to use PivotTables in Excel. To get to the lowest level they cross-joined the lowest levels of all the hierarchies on the rows section which would give the right result, but performance was terrible, with several queries taking 20 minutes or more and others not returning at all.

We hoped using an XLCubed table running DAX would be the solution and created the same report in the designer. Sadly while performance was a little better it was still far from acceptable; the model was large, and the number or columns combined with their cardinality meant that a lot of work was being done on the server.

XLCubed’s DAX generator was trying to cross-join all the values from each column, which had worked well for our other customers. But when there are a dozen columns including the transaction ID things do not go so well. DAX in itself is not a magic bullet and SSAS Tabular models can hit performance problems on low level data – we needed a new approach.

After some investigation we discussed the issue and our thinking with our friends at SQLBI and determined that instead of cross-join we wanted an option to use Summarize() instead as this only uses the rows in the database, and it can access columns related to the summarized table which were required for the report.

As the customer’s report had the transaction ID in it the result wasn’t aggregated, even though we were using summarize. But we wanted to add true transactional reporting too, using the Related() function.

Finally, SQL 2016 adds a couple of new functions, SummarizeColumns() and SelectColumns(), both of which are useful for this type of reporting, but offer better performance than the older equivalents.

The end result in XLCubed is a new option for DAX tables to allow users to set the type of report they want to run, and some internal changes so that XLCubed will automatically use the most efficient DAX function where they are available.

A beta was sent to the business users and the results were fantastic. The report which had run for several minutes now completed in a few seconds, and 20 minutes was down to 15 seconds – we had some very happy users!

The changes will be in the next release of XLCubed so that all our customers can benefit from the improvements. It’s always nice when a customer request helps improve the product for everyone.

A sample of the syntax change is included below

Before:

 

EVALUATE
FILTER (
    ADDCOLUMNS (
        KEEPFILTERS (
            CROSSJOIN ( VALUES ( 'Customer'[Education] ), VALUES ( 'Product'[Color] ) )
        ),
        "Internet Total Units", 'Internet Sales'[Internet Total Units],
        "Internet Total Sales", 'Internet Sales'[Internet Total Sales]
    ),
    NOT ISBLANK ( [Internet Total Units] )
)
ORDER BY
    'Customer'[Education],
    'Product'[Color]

After:

 

EVALUATE
FILTER (
    ADDCOLUMNS (
        KEEPFILTERS (
            SUMMARIZE ( 'Internet Sales', 'Customer'[Education], 'Product'[Color] )
        ),
        "Internet Total Units", 'Internet Sales'[Internet Total Units],
        "Internet Total Sales", 'Internet Sales'[Internet Total Sales]
    ),
    NOT ISBLANK ( [Internet Total Units] ) || NOT ISBLANK ( [Internet Total Sales] )
)
ORDER BY
    'Customer'[Education],
    'Product'[Color]

XLCubed V7 & SQL Server 2012

SQL server 2012 has recently been released to manufacturing, and at XLCubed we’re well placed to take advantage of everything that is new in 2012.

SQL 2012 delivers Business Intelligence under the ‘BISM’ umbrella (Business Intelligence Semantic Model). BISM comes in different flavours though:

  • BISM Multi-dimensional
    • (Latest version of Analysis Services as we know it)
  • BISM Tabular
    • In-Memory Vertipaq
    • Direct Query

For client tools, BISM Multi-dimensional is largely the same as connecting to existing versions of Analysis Services, with MDX being the query language. For XLCubed we can leverage what we already have in that respect, and the transition is seamless.

BISM tabular is different though. If you choose to deploy in-memory to Vertipaq, client tools can still use MDX, and as such don’t need significant change, other than to handle the tabular rather than hierarchical data environment. However if the deployment is Direct Query (for example for real-time BI), the only available query language is DAX.

There are best use cases for the different deployment options, but it’s fair to say there is a degree of confusion in the space at the moment about the relative merits of each. We’ll try to shed some light and guidance here over the next weeks and months. As a product though, it’s important for us to support and extend the full range of 2012 BI deployment options, and to make these available and accessible to our customers. That’s exactly what we’ve done for version 7.

XLCubed v7, which releases next month, is a client for both MDX and DAX, and as such provides one consistent client interface in Excel and on the Web which can access any of the SQL 2012 deployment models for BI. We are also adding a much richer relational SQL reporting environment.

We are really pleased with some of the beta feedback we’ve had to date, and if you’d like to trial the beta version contact us at beta@xlcubed.com .

We’re looking forward to releasing the product next month, and will be previewing it at SQL Server Connections next week in Vegas.

 

Solve order shenanigans

Today I’m going to blog about a problem we recently solved in a client’s cube, an error in the Mdx script that’s very easy to make if you aren’t careful.

We’ll run a simple example in AdventureWorks (what else?) to demonstrate the issue.

The client had already added a calculation to their cube to show year-on-year growth. The formula is:

Create Member CurrentCube.[Measures].[Delta to PrevYear] as
(
    ([Measures].[Internet Sales Amount])
    -
    ([Measures].[Internet Sales Amount],
        ParallelPeriod(
            [Date].[Calendar].[Calendar Year],
            1,
            [Date].[Calendar].CurrentMember
        )
    )
)
/
    ([Measures].[Internet Sales Amount],
        ParallelPeriod(
            [Date].[Calendar].[Calendar Year],
            1,
            [Date].[Calendar].CurrentMember
        )
    )
, Format_String = "0.00%";

(some error checking removed for clarity)

This screenshot shows a couple of simple XLCubed Grids showing the real value, and below the percentage change. I have added in an Excel calculation to show the results are as expected.

Later during the cube development, the client added a calculated member in their Product dimension, one that gives a total excluding one of the product categories.

To replicate this I’ll add a calculation for “All Ex Bikes”:

Create Member 
CurrentCube.[Product].[Product Model Categories].[All Products].[All Ex Bikes]
as
(
    ([Product].[Product Model Categories].[All Products])
    -
    ([Product].[Product Model Categories].[Category].&[1])
);

And if we run the report again we get the following.

Notice the cell I’ve highlighted. The “All Ex Bikes” calculation works fine on the normal measure, but it gives totally the wrong number for the percentage calculation. What’s going on?

The problem is that in the cell highlighted Analysis Services has two calculations to think about when working out the result.

  • Compare this year to last year
  • Get the “Grand Total”, and subtract “Bikes”

As the number returned is 1.85% we can see that Analysis Services has chosen the second option, “Grand Total” – “Bikes”.

What we really want is for the calculation to be done by getting the subtotal, and then doing the percentage change based on that.

Fortunately the fix was a simple one. Analysis Services will run the calculations in the order they are found in the Mdx Script, so to fix the issue we simply moved the new “All Ex Bikes” definition up above the percentage calculation.

Now the number returned matches our expectations.

Pass/Solve Order can be a complex topic, so you may need to be careful.

In this case the number is totally wrong, so it was easy to spot, but some bugs will be much more subtle, so watch out!

Ranking, Sorting and Filtering

Once we have returned cube members into a grid report we often need to exclude or change the order of the result set to provide more meaningful information. MDX (Multidimensional Expressions) language includes some very useful operators to provide filtering (FILTER), sorting (ORDER) and ranking (TOPCOUNT/BOTTOMCOUNT) of dimension members. These can be quite overwhelming even for power users of XLCubed.  So, in V6, we have introduced a new feature “Advanced Member Selections” to provide easy access to this powerful part of Microsoft Analysis Services.

Using this new functionality we can nest and combine these operations to answer complex business questions (for simpler operations you can right-click on a member in the grid and use the “Apply” menu to perform simple ranking, filters and sorting).

Filtering

So let’s go through a simple filtering example.  Say, for example, that we want to find the products at Product Key level that sold more than 25 units in 2003, Quarter 1 and show the sales figures for those subcategories during 2003 and its quarters.

  1. Start by clicking the Grid ribbon item (or the XLCubed > Design Grid menu item in Excel 2003 and below), and selecting the Internet Sales cube file
  2. Drag Calendar Period to Columns and Product to Rows. You can also drag any other hierarchies to Headers. In the example image below, Measures and Customer have been added there.

  1. Click on the Product hierarchy so that its details appear in the bottom-right panel.
  2. Drag the Product key level over to the right of the dialog. You can switch between the members view and levels view by clicking on the Show Levels icon ().
  3. Click the Advanced tab to show the advanced selection pane:

  1. Click the Members drop down and choose Filter result:


  1. Click the Calendar Period edit control in the grid to change its selection to the desired member (2003, Quarter 1):

  1. Select the This measure radio button, and select Order Quantity as the desired measure.
  2. Change the Operation to >, and type 25 in the edit field on the right:

  1. Click OK. The new filter is displayed in the advanced selections tab:

  1. Click OK again to run the Report – the Grid shows the members that fit our criteria:

 

So we can see the results, filtering by 2003 Q1, but displaying the values for All Time (or any other period we wish to use). We could have also used the Range selector:    to drive the period selecting from an Excel Range and our grid would automatically refresh whenever the driving value changes.

Ranking

Now let’s add a ranking to find the bottom 8 selling products at the Product Key level that have sold more than 25 units inQ1:

  1. Display the Product Hierarchy Editor dialog
  2. Click the Rank result icon () on the advanced selections tab to display the Edit Ranking dialog
  3. Select the Bottom radio button, and type 8 into the edit field
  4. Select 2003, Quarter 1 for the Calendar Period hierarchy in the grid below:

We now have the filter, following by the ranking:

 

Run the Grid: only the lowest 8 members are returned

 

Sorting

Now let’s sort the report on a different dimension – for example, descending order of the Q1 sales.

  1. Display the Hierarchy Editor for the Product hierarchy by double-clicking on the Product label in the Grid
  2. If it’s not already visible, select the Advanced tab
  3. Click the Sort result toolbar button ()
  4. Change the Calendar Period selection to 2003, Quarter 1:

  1. Click the Sort Descending (9-1) radio button
  2. Click OK. The new sort is displayed in the advanced selections tab
Click OK again to run the Report

 

Joining Results

It’s also possible to join different results together: combining both sets (UNION), excluding members (EXCEPT) and returning common members (INTERSECT).

So we could also add the top 10 products  along side the bottom 8 products to the grid. Begin by adding another member selection using the “Add Member List” tool-bar button:

As before, we select the list of members to rank (in this case the Product Key level) and then select the operation we want to perform, a Top 10:

There are various options to decide how to combine the lists, we’ll stick with Add:

 

 

And we get both results combined:


So the “Advanced Member Selections” feature provides lots of the power of Analysis Services in a simplified way  – to try this feature for yourself you can begin by downloading XLCubed.

Sql Server “Denali” CTP3 – first impressions…

Microsoft recently released their third CTP of Denali the upcoming SQL Server release (SQL Server 2011), so here are some initial thoughts now it’s more widely available.:

The first thing to look at is the new Tabular mode for Analysis Services (as opposed to the traditional multi-dimensional mode, which is still available). This is the server version of the VertiPaq engine first seen in the PowerPivot add-in, and moves the engine from being a personal/team tool to an organisation/enterprise level affair.

This means IT are going to get involved (and people can disagree about how they feel about that!), but that report sharing should be easier as data is held centrally. In the past the report contained all the data, which could make for very large workbooks, or you published to SharePoint, which not everyone was set up to do.

Cubes can be queried using MDX, which is great for a front-end vendor like us, and XLCubed works out of the box against the CTP. Existing functionality is working smoothly, and as Microsoft Gold Partners we’re working closely with the releases to utilise all the functionality for the RTM.

We have ported a few existing cubes to the new architecture and one first impression is that removing columns or using perspectives is going to be needed to keep things sensible for end-users, you can quickly end up with hundreds of attributes.

The ability to create hierarchies was something that was often asked for in PowerPivot, and thankfully that’s there now. This should simplify many cubes.

Attribute-tastic

 

The intricacies of MDX put most business users off trying to use it directly, whereas DAX’s similarities with Excel functions means there is more scope to have users create formulae on the fly. Examining how best to expose that to users is something we’ll be spending some time on in the coming months..

Easier distinct counts and the built in date calculations are the obvious candidates, but there are a number of others which we feel we can make more accessible for the majority of users.

It’s certainly an interesting move, and thinking in Tables and Columns instead of the Multidimensional model takes some getting used to, conversely for some people its more natural.

It’ll also be interesting to see how MDX and DAX are integrated. The Tabular server supports both languages for query. Currently using MDX you can use the “With Member” syntax to create members sent to the Tabular server, could you declare a DAX calculation in a similar manner?

Parent-Child Dimensions in Analysis Services – Performance Walkthrough

Parent-child hierarchies are a good fit for many data structures such as accounts or employees, and while they can speed development in some cases, they can also cause performance problems in large cubes.

We often see customers with these type of performance issues, and thought it worth sharing a simple technique for altering the dimension structure to improve query speed.

The problem

Often parent-child hierarchies are created as this is the structure used in the relational source, so they seem a good fit to model the members. In many cases though data is only at the leaf level of the hierachy, meaning parent-child isn’t really needed.

Performance problems occur because no aggregates are created for parent-child dimensions, as detailed in the Analysis Services performance guide:

Parent-child hierarchies

Parent-child hierarchies are hierarchies with a variable number of levels, as determined by a recursive relationship between a child attribute and a parent attribute. Parent-child hierarchies are typically used to represent a financial chart of accounts or an organizational chart. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled. As such, refrain from using parent-child hierarchies that contain large numbers of members at intermediate levels of the hierarchy. Additionally, you should limit the number of parent-child hierarchies in your cube.

If you are in a design scenario with a large parent-child hierarchy (greater than 250,000 members), you may want to consider altering the source schema to re-organize part or all of the hierarchy into a user hierarchy with a fixed number of levels. Once the data has been reorganized into the user hierarchy, you can use the Hide Member If property of each level to hide the redundant or missing members.

 

The performance guide hints at re-organizing the hierarchy to improve perfomance, but doesn’t say how.

The solution

This article will walkthrough the steps needed to change your parent-child hierarchy structure to have real levels, so that aggregations work, and your performance is as good as you expect with normal hierarchies.

This process is known as flattening or normalizing the parent-child hierarchy.

Firstly, let’s look at the data in our relational source.

[spoiler intro=”Code” title=”Sql Create Script”]

CREATE TABLE [dbo].[Products](
 [ProductID] [int] NOT NULL,
 [ParentID] [int] NULL,
 [Name] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED ([ProductID] ASC)
)

GO

insert into Products(ProductID, ParentID, Name) values(1, NULL, 'All')
insert into Products(ProductID, ParentID, Name) values(2, 1, 'Fruit')
insert into Products(ProductID, ParentID, Name) values(3, 2, 'Red')
insert into Products(ProductID, ParentID, Name) values(4, 3, 'Cherry')
insert into Products(ProductID, ParentID, Name) values(5, 3, 'Strawberry')
insert into Products(ProductID, ParentID, Name) values(6, 2, 'Yellow')
insert into Products(ProductID, ParentID, Name) values(7, 6, 'Banana')
insert into Products(ProductID, ParentID, Name) values(8, 6, 'Lemon')
insert into Products(ProductID, ParentID, Name) values(9, 1, 'Meat')
insert into Products(ProductID, ParentID, Name) values(10, 9, 'Beef')
insert into Products(ProductID, ParentID, Name) values(11, 9, 'Pork')

[/spoiler]

Not a large dimension, but enough to demonstrate the technique. As you can see my real products are all at the leaf level.

The strategy is quite simple:

  • Create a view to seperate the members into different levels.
  • Create a new dimension using these real levels.
  • Configure the dimension to appear like the original parent-child dimension, but with the performance of a normal dimension.

Create the view

We want to create a denormalised view of the data. To do this we join the Product to itself once for each level. This does mean we need to know the maximum depth of the hierarchy, but often this is fixed, and we’ll build in some extra levels for safety.

The tricks here are:

  • Use coalesce() so that we always get the lowest level ID below the leaves, never a NULL. This allows us to join to the fact table at the bottom level of our hierarchy.
  • Leave Name columns null below the leaves, this will allow us to stop the hierarchy at the correct leaf level in each part of the hierarchy.

[spoiler intro=”Code” title=”Sql View Script”]

create view dbo.ProductsFlattened

as

select    P1.ProductID as Level1ID,
 P1.Name as Level1Name,
 coalesce(P2.ProductID, P1.ProductID) as Level2ID,
 P2.Name as Level2Name,
 coalesce(P3.ProductID, P2.ProductID, P1.ProductID) as Level3ID,
 P3.Name as Level3Name,
 coalesce(P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level4ID,
 P4.Name as Level4Name,
 coalesce(P5.ProductID, P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level5ID,
 P5.Name as Level5Name

from    dbo.Products P0
left join    dbo.Products P1
 on        P0.ProductID = P1.ParentID
left join    dbo.Products P2
 on        P1.ProductID = P2.ParentID
left join    dbo.Products P3
 on        P2.ProductID = P3.ParentID
left join    dbo.Products P4
 on        P3.ProductID = P4.ParentID
left join    dbo.Products P5
 on        P4.ProductID = P5.ParentID

where P0.ParentID is null

[/spoiler]

Running this we get:

Obviously we can update this view to create more levels as required, but 5 are enough for now.

The Dimension

Next we go to BIDS, and add the view to our Data Source View, and then add a new Dimension based on the view.

The key steps to creating the dimension correctly are:

  • Set the key attribute to Level5ID, and the name to Level5Name.
  • Create an attribute for each Level ID, and on each set the Name Column appropriately.
  • Create a hierarchy using these attributes in order.
  • On each attribute set AttributeHierarchyVisible to False.
  • On each level of the hierarchy set HideMemberIf to NoName.
  • Set up the Attribute Relationships between the levels.

You should end up with the following:

Dimension Structure

 

Attribute Relationships

 

If you browse the dimension you’ll see that it never goes as far as level 5, even though it exists. This is becuase we set up the member hiding option, and returned NULLs in our view.

Conclusion

And that’s it done, you can now join to your fact tables at the lowest level, build your cube as normal and get the performance benefits of aggregation!

See also

A tool to achieve the same result is available from Codeplex, we’ve not personally tried it but may well be a timesaver. This works in a similar way to the example above, but it’s often useful to understand how something works, even if you choose to automate it.

PowerPivot, SQL R2, Sharepoint 2010, Office 2010.

So we’ve been using PowerPivot for a while now, and Office 2010 has been part of our lives for some time. I’ll use this blog to answer some of the questions that keep cropping up in conversation with our customers:

1. Does XLCubed work with Excel (Office 14) 2010?

a. Yes, we’ve been using it since the first CTP release and each release since then.

2. Can I use XLCubed Web with SharePoint 2010?

a. Yes, publishing to the web and embedding the reports within your SharePoint site works in exactly the same way as with previous versions.

3. Does XLCubed connect to PowerPivot?

a. Yes, XLCubed connects to the PowerPivot published cubes, and our client tools can be used to build reports and dashboards from them.

4. Can I build reports from SQL Server R2 using XLCubed?

a. Yes this will work just fine, just as you can build reports from previous version of SQL or other relational sources. (here is an example)

PowerPivot in the real world

The services team have been working on migrating some of our internal models and sample databases across to a PowerPivot environment – looking at the pros and cons, using DAX rather than MDX to perform some calculations. Results have been varied, its been interesting to see some features that we’ve had for a while (like cube formulas, slicers and web parameters) appear in a similar way in PowerPivot.

Quite clearly PowerPivot isn’t the be all and end all or anything like a replacement for Analysis Services, but it certainly has a role for tactical solutions, some power user analysis, and we think likely also for RAD prototypes of larger scale AS implementations. It doesn’t venture into the gap left by PerformancePoint Planning (as many thought it would in early 2009) – we’ve moved to address this area with the XLCubed PM suite that uses in memory OLAP cubes and/or Analysis Services.

Trying out some of the tools

Here’s a few download sets for you to try, take careful note of the hardware spec and requirements for the MS ones though:

The 2010 Information Worker Virtual machine

Register and Download Office 2010

PowerPivot 32Bit, 64Bit

XLCubed Evaluation

If you would like to evaluate against your own data – contact the XLCubed Product team for evaluation editions or if you want to try a no risk proof of concept or prototype contact the XLCubed consulting team.

Cube Design – meeting the business needs

 

Following on from our previous blog post on a couple of the common cube performance issues we’ve seen this last month, I thought I’d mention some of the non-technical issues we see quite often. In one case, once we’d made a few teaks and sorted out the cube performance issues we had to ask – Is the cube doing what it needs to? (Of course we did ask this first but the priority was sorting out the current cube performance!) Does it meet the business requirement? There’s no point in having the most complex cube that uses all the greatest features if it can’t answer the users queries.

In reports, we’ve seen examples where clients have nested four or five attributes to build up the effects of a hierarchy or run huge queries then vlookups on them to get the data they need, or bring back 12 columns of data and manually work out year to date, or not have any hierarchies that reflected commonly used groupings of members, or not have member names formatted in the way the business needs. To us this just isn’t right.

The users might not seem to care too much if they don’t know how the cube could work or if it runs fast enough to bring back huge result sets they can manipulate themselves – but doesn’t that negate the point of having a cube and your investment in it? Consumers of the cube should have fast, timely, accurate and importantly appropriate data made available to them in a manner that makes sense.

Cube design and build is about understanding the business and users needs and then building the cube and associated processes, that’s before even starting to build the reports and conveying the information using good data visualisation practices.

All too often we’re seeing a drive to use the latest tech, the flashiest widgets, cool looking 3D and shading effects on reports through to cubes and databases with every conceivable hierarchy or type of measure thought possible but not bearing much resemblance to what the users need to see.

I won’t hide the fact that we’re very proud of our skills and experience in ensuring our clients get not just a technically excellent system but also one that fits their needs. If you want to talk to one of the team about how they can help, you can find our contact details here.

Common Analysis Services Performance Issues

A quick blog post from the Services team here at XLCubed on some performance problems with SSAS that we’ve seen again recently. With the processing power and memory available it’s pretty easy to build a fast cube – both for query performance and processing time. It is also easy to be lax in cube design, ignore the warnings and best practice guidelines, and end up with a cube that’s looks concise, is neat and clever but performs terribly for end users.

We’ve come across a couple of examples of this at client sites in the last month, and there are some common issues that always seem to jump out – rectifying these normally has a very positive impact. The three most common culprits we see are:

Parent-Child dimensions – Parent-Child dimensions are nice and easy to build and use. However, as you can’t build aggregations that include a parent-child dimension it can make for a badly performing cube! Try to flatten dimensions out and evaluate exactly why a parent-child dimension is required and being used. They are not the only option..

Unary operators, Custom-roll ups – we’ve seen cases where these have been included in every dimension in a cube by default. If there isn’t a need for them – leave them out! If you can get around using a custom rollup or unary operator by some simple work in the ETL process it may be better to do that first.

If your query performance is bad – try removing all unary operators and custom rollups then re-test the cube. How’s the performance now? It should be significantly faster – evaluate and review the need for the unary operators and custom rollups and see if the same effect can be achieved differently (e.g. in the ETL layer)

Cache vs. Non-Cache Data – Basically is the cube recalculating and re-querying numbers over and over again or can it re-use results? Use profiler to check for cache or non-cache data when your queries are running. So many times we’ve seen all queries not using the cache because AS hasn’t been given enough available memory or volatile operators such as now() have been used in mdx calcs.

Resolving the issues above had a massive impact – reports taking up to 3 minutes to run were down to a few seconds, users could begin to use the application properly for the first time, however fixing the performance may be only part of the task. The cube of course needs to have been designed to meet the business requirements, but that’s another blog..