Parent-child hierarchies are a good fit for many data structures such as accounts or employees, and while they can speed development in some cases, they can also cause performance problems in large cubes.
We often see customers with these type of performance issues, and thought it worth sharing a simple technique for altering the dimension structure to improve query speed.
Often parent-child hierarchies are created as this is the structure used in the relational source, so they seem a good fit to model the members. In many cases though data is only at the leaf level of the hierachy, meaning parent-child isn’t really needed.
Performance problems occur because no aggregates are created for parent-child dimensions, as detailed in the Analysis Services performance guide:
Parent-child hierarchies are hierarchies with a variable number of levels, as determined by a recursive relationship between a child attribute and a parent attribute. Parent-child hierarchies are typically used to represent a financial chart of accounts or an organizational chart. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled. As such, refrain from using parent-child hierarchies that contain large numbers of members at intermediate levels of the hierarchy. Additionally, you should limit the number of parent-child hierarchies in your cube.
If you are in a design scenario with a large parent-child hierarchy (greater than 250,000 members), you may want to consider altering the source schema to re-organize part or all of the hierarchy into a user hierarchy with a fixed number of levels. Once the data has been reorganized into the user hierarchy, you can use the Hide Member If property of each level to hide the redundant or missing members.
The performance guide hints at re-organizing the hierarchy to improve perfomance, but doesn’t say how.
This article will walkthrough the steps needed to change your parent-child hierarchy structure to have real levels, so that aggregations work, and your performance is as good as you expect with normal hierarchies.
This process is known as flattening or normalizing the parent-child hierarchy.
Firstly, let’s look at the data in our relational source.
[spoiler intro=”Code” title=”Sql Create Script”]
CREATE TABLE [dbo].[Products]( [ProductID] [int] NOT NULL, [ParentID] [int] NULL, [Name] [varchar](50) NOT NULL, CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED ([ProductID] ASC) ) GO insert into Products(ProductID, ParentID, Name) values(1, NULL, 'All') insert into Products(ProductID, ParentID, Name) values(2, 1, 'Fruit') insert into Products(ProductID, ParentID, Name) values(3, 2, 'Red') insert into Products(ProductID, ParentID, Name) values(4, 3, 'Cherry') insert into Products(ProductID, ParentID, Name) values(5, 3, 'Strawberry') insert into Products(ProductID, ParentID, Name) values(6, 2, 'Yellow') insert into Products(ProductID, ParentID, Name) values(7, 6, 'Banana') insert into Products(ProductID, ParentID, Name) values(8, 6, 'Lemon') insert into Products(ProductID, ParentID, Name) values(9, 1, 'Meat') insert into Products(ProductID, ParentID, Name) values(10, 9, 'Beef') insert into Products(ProductID, ParentID, Name) values(11, 9, 'Pork') [/spoiler]
Not a large dimension, but enough to demonstrate the technique. As you can see my real products are all at the leaf level.
The strategy is quite simple:
- Create a view to seperate the members into different levels.
- Create a new dimension using these real levels.
- Configure the dimension to appear like the original parent-child dimension, but with the performance of a normal dimension.
Create the view
We want to create a denormalised view of the data. To do this we join the Product to itself once for each level. This does mean we need to know the maximum depth of the hierarchy, but often this is fixed, and we’ll build in some extra levels for safety.
The tricks here are:
- Use coalesce() so that we always get the lowest level ID below the leaves, never a NULL. This allows us to join to the fact table at the bottom level of our hierarchy.
- Leave Name columns null below the leaves, this will allow us to stop the hierarchy at the correct leaf level in each part of the hierarchy.
[spoiler intro=”Code” title=”Sql View Script”]
create view dbo.ProductsFlattened as select P1.ProductID as Level1ID, P1.Name as Level1Name, coalesce(P2.ProductID, P1.ProductID) as Level2ID, P2.Name as Level2Name, coalesce(P3.ProductID, P2.ProductID, P1.ProductID) as Level3ID, P3.Name as Level3Name, coalesce(P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level4ID, P4.Name as Level4Name, coalesce(P5.ProductID, P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level5ID, P5.Name as Level5Name from dbo.Products P0 left join dbo.Products P1 on P0.ProductID = P1.ParentID left join dbo.Products P2 on P1.ProductID = P2.ParentID left join dbo.Products P3 on P2.ProductID = P3.ParentID left join dbo.Products P4 on P3.ProductID = P4.ParentID left join dbo.Products P5 on P4.ProductID = P5.ParentID where P0.ParentID is null
Running this we get:
Obviously we can update this view to create more levels as required, but 5 are enough for now.
Next we go to BIDS, and add the view to our Data Source View, and then add a new Dimension based on the view.
The key steps to creating the dimension correctly are:
- Set the key attribute to Level5ID, and the name to Level5Name.
- Create an attribute for each Level ID, and on each set the Name Column appropriately.
- Create a hierarchy using these attributes in order.
- On each attribute set AttributeHierarchyVisible to False.
- On each level of the hierarchy set HideMemberIf to NoName.
- Set up the Attribute Relationships between the levels.
You should end up with the following:
If you browse the dimension you’ll see that it never goes as far as level 5, even though it exists. This is becuase we set up the member hiding option, and returned NULLs in our view.
And that’s it done, you can now join to your fact tables at the lowest level, build your cube as normal and get the performance benefits of aggregation!
A tool to achieve the same result is available from Codeplex, we’ve not personally tried it but may well be a timesaver. This works in a similar way to the example above, but it’s often useful to understand how something works, even if you choose to automate it.
13 Replies to “Parent-Child Dimensions in Analysis Services – Performance Walkthrough”
When MDXing the flattened dimension, then, you have to specify the correct level of the child?
[Products Flattened].[Level 2].[Yellow]
in a classic Parent-Child?
You can control the unique name style by changing the “Captions” & “Keys” unique properties, assuming uniqueness you could achieve a unique name along the lines of:
Thanks for the reply. I really appreciate you blogging this.
Another question: Looking at ParentChild2.png, will you not get a duplicate attribute key error for Level2ID (2 = Red and Yellow) and (9 = Beef and Pork)?
No, since the lower IDs are different, Analysis Services knows that it’s not a duplicate, just that the parent member is the same.
Really nice article ! Just one question, what in the case when I have parent-child hierarchy where data are also at the non leaf level of the hierarchy?
In that case you could create a “Dummy” member at the lowest level and just load the data against that. You could give it the description: (Parent Caption) to differentiate it.
The other approach is to use a different measure group to store the values at the parent levels and then bring them together in one calculation that combines the measures from each group.
Hope that helps!
Great post! I have one question; Creating the hierarchy this way, won’t you encounter errors if you loaded fact data for an intermediate level? Say your fact data was loaded for Level2 Red: if you select level1 as the measures used dimension, all of the children would show the data for the parent, which isn’t accurate, but if you showed the leaf level of the hierarchy, the cube would error on processing because no fact data is tied to it. How do you specify a level of granularity that will allow for differentiating the parent/child data without throwing an error at process time?
There is an option on the measure group “Ignore Unrelated Dimensions”, setting this to False will not display the parent’s data and the children’s level.
The other approach is to create a dummy child to hold the parent’s data.
If you need implementing dynamic dimension data security on different level, say, some users can see data at level1, some users can see some members of level2 and/or some members at level3, how would you implement ‘Allowed member set’ in a user role editor?
Not sure what you are asking – it’s still possible to control security in the same way as with a normal parent/child dimension. Is there a particular problem you are having?
I am seeing some issues with the cube when I browse the measures with filter dimension hierarchy filters at multiple level (say DimTime, Year and Semester), whereas there is no issues (slowness) if I browse at same level like (Semister1, Semester 4) etc. I tried adding the aggregations of the attributes which are used in hierarchy. Any recommendations are highly appreciated.
BTW, I would like to ask again about dynamic security. If I want to give access to a user A to Fruit, a user B only to Red and a user C only to Lemon, how could I implement that using the bridge table technique. Usually you create a bridge table with the user key and the dimension attribute key you want to allow. Then you create a factless fact table using this bridge table and then create a Count measure. Then for the current user you proof if this count is empty or not in a given dimension. I cannot see how can I add members from other levels in this bridge table.
Any comment will be appreciated.
Thank you for your Post! My question is: How could Rollup (+, -, ~ would be required) be implemented?