Parent-child hierarchies are a good fit for many data structures such as accounts or employees, and while they can speed development in some cases, they can also cause performance problems in large cubes.
We often see customers with these type of performance issues, and thought it worth sharing a simple technique for altering the dimension structure to improve query speed.
Often parent-child hierarchies are created as this is the structure used in the relational source, so they seem a good fit to model the members. In many cases though data is only at the leaf level of the hierachy, meaning parent-child isn’t really needed.
Performance problems occur because no aggregates are created for parent-child dimensions, as detailed in the Analysis Services performance guide:
Parent-child hierarchies are hierarchies with a variable number of levels, as determined by a recursive relationship between a child attribute and a parent attribute. Parent-child hierarchies are typically used to represent a financial chart of accounts or an organizational chart. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled. As such, refrain from using parent-child hierarchies that contain large numbers of members at intermediate levels of the hierarchy. Additionally, you should limit the number of parent-child hierarchies in your cube.
If you are in a design scenario with a large parent-child hierarchy (greater than 250,000 members), you may want to consider altering the source schema to re-organize part or all of the hierarchy into a user hierarchy with a fixed number of levels. Once the data has been reorganized into the user hierarchy, you can use the Hide Member If property of each level to hide the redundant or missing members.
The performance guide hints at re-organizing the hierarchy to improve perfomance, but doesn’t say how.
This article will walkthrough the steps needed to change your parent-child hierarchy structure to have real levels, so that aggregations work, and your performance is as good as you expect with normal hierarchies.
This process is known as flattening or normalizing the parent-child hierarchy.
Firstly, let’s look at the data in our relational source.
[spoiler intro=”Code” title=”Sql Create Script”]
CREATE TABLE [dbo].[Products]( [ProductID] [int] NOT NULL, [ParentID] [int] NULL, [Name] [varchar](50) NOT NULL, CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED ([ProductID] ASC) ) GO insert into Products(ProductID, ParentID, Name) values(1, NULL, 'All') insert into Products(ProductID, ParentID, Name) values(2, 1, 'Fruit') insert into Products(ProductID, ParentID, Name) values(3, 2, 'Red') insert into Products(ProductID, ParentID, Name) values(4, 3, 'Cherry') insert into Products(ProductID, ParentID, Name) values(5, 3, 'Strawberry') insert into Products(ProductID, ParentID, Name) values(6, 2, 'Yellow') insert into Products(ProductID, ParentID, Name) values(7, 6, 'Banana') insert into Products(ProductID, ParentID, Name) values(8, 6, 'Lemon') insert into Products(ProductID, ParentID, Name) values(9, 1, 'Meat') insert into Products(ProductID, ParentID, Name) values(10, 9, 'Beef') insert into Products(ProductID, ParentID, Name) values(11, 9, 'Pork') [/spoiler]
Not a large dimension, but enough to demonstrate the technique. As you can see my real products are all at the leaf level.
The strategy is quite simple:
- Create a view to seperate the members into different levels.
- Create a new dimension using these real levels.
- Configure the dimension to appear like the original parent-child dimension, but with the performance of a normal dimension.
Create the view
We want to create a denormalised view of the data. To do this we join the Product to itself once for each level. This does mean we need to know the maximum depth of the hierarchy, but often this is fixed, and we’ll build in some extra levels for safety.
The tricks here are:
- Use coalesce() so that we always get the lowest level ID below the leaves, never a NULL. This allows us to join to the fact table at the bottom level of our hierarchy.
- Leave Name columns null below the leaves, this will allow us to stop the hierarchy at the correct leaf level in each part of the hierarchy.
[spoiler intro=”Code” title=”Sql View Script”]
create view dbo.ProductsFlattened as select P1.ProductID as Level1ID, P1.Name as Level1Name, coalesce(P2.ProductID, P1.ProductID) as Level2ID, P2.Name as Level2Name, coalesce(P3.ProductID, P2.ProductID, P1.ProductID) as Level3ID, P3.Name as Level3Name, coalesce(P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level4ID, P4.Name as Level4Name, coalesce(P5.ProductID, P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level5ID, P5.Name as Level5Name from dbo.Products P0 left join dbo.Products P1 on P0.ProductID = P1.ParentID left join dbo.Products P2 on P1.ProductID = P2.ParentID left join dbo.Products P3 on P2.ProductID = P3.ParentID left join dbo.Products P4 on P3.ProductID = P4.ParentID left join dbo.Products P5 on P4.ProductID = P5.ParentID where P0.ParentID is null
Running this we get:
Obviously we can update this view to create more levels as required, but 5 are enough for now.
Next we go to BIDS, and add the view to our Data Source View, and then add a new Dimension based on the view.
The key steps to creating the dimension correctly are:
- Set the key attribute to Level5ID, and the name to Level5Name.
- Create an attribute for each Level ID, and on each set the Name Column appropriately.
- Create a hierarchy using these attributes in order.
- On each attribute set AttributeHierarchyVisible to False.
- On each level of the hierarchy set HideMemberIf to NoName.
- Set up the Attribute Relationships between the levels.
You should end up with the following:
If you browse the dimension you’ll see that it never goes as far as level 5, even though it exists. This is becuase we set up the member hiding option, and returned NULLs in our view.
And that’s it done, you can now join to your fact tables at the lowest level, build your cube as normal and get the performance benefits of aggregation!
A tool to achieve the same result is available from Codeplex, we’ve not personally tried it but may well be a timesaver. This works in a similar way to the example above, but it’s often useful to understand how something works, even if you choose to automate it.