Data Mashups 101

In an ideal world for report designers, all the data required for a report is available in one data source, and the structure and hierarchies perfectly match the reporting needs. Sadly, the world isn’t always ideal (just ask Theresa May…).

We often see scenarios where users are reporting numerical data from Analysis Services but want to include descriptive or textual information held elsewhere. People have approached this in different ways, but XLCubed’s mashup capability can make it a much more streamlined and maintainable process.

Continue reading “Data Mashups 101”

Auto refresh in XLCubed Pivot-views and Power BI

Version 9 of XLCubed introduced Pivot-views, which allow our powerful grids, small multiples and slicers to be driven directly from lots of different relational sources.

We also added support for Power BI as a data source, and many of our customers are now using XLCubed on Power BI data for a better in-Excel experience and slice and dice.

Many of you will already be familiar with the data refresh options in Power BI, essentially either a scheduled refresh process, or through the Enterprise Gateway.

With pivot-views we wanted to ensure that  the data refresh process is very straightforward, and to enable business users to benefit from them without worrying about scheduling data flows or getting involved with discussions about gateway configuration. Fundamentally, the data will automatically update when the report is opened or if any of the query parameters change (either by updating a slicer or if an Excel cell-based parameter value changes).

This means the report builder doesn’t have to worry about the latest view of the data as that is all handled automatically without any custom code, manual intervention or IT assistance.

Here is how it works in detail.

First insert a grid and select a relational database as the source.

We are presented with the query designer, we can pick a table and setup a parameter by entering “@productgroup”:

This gives us an option to link that parameter to a cell or directly to a slicer:

We can now design the grid we want to see:

And view the result.

Now, if I update the driving cell:

The query is re-run and the grid (and anything else driven from the data has updated to reflect the new query results):

This automatic handling  of the data refresh continues once the report is published to XLCubedWeb.  Whether accessing from Excel, a browser or the XLCubed Mobile App, the report is based on current data without the need for any additional handling or refresh processing… One less thing for the report designer to worry about, one less thing to go wrong!

DAX Performance tips– lessons from the field

XLCubed has supported a drag/drop interface for creating reports against Tabular Analysis Services since the first release of the new engine. It lets users easily create reports which run DAX queries on the cube, and we’ve often seen very good performance at customers when MDX against Tabular was a cause of long running reports.

So when we were approached at SQL Pass in Seattle by some attendees who had a SSAS Tabular performance issue we were optimistic we’d be able to help.

In this case the business wanted to retrieve thousands of rows from the cube at the transactional level, and the first approach had been to use PivotTables in Excel. To get to the lowest level they cross-joined the lowest levels of all the hierarchies on the rows section which would give the right result, but performance was terrible, with several queries taking 20 minutes or more and others not returning at all.

We hoped using an XLCubed table running DAX would be the solution and created the same report in the designer. Sadly while performance was a little better it was still far from acceptable; the model was large, and the number or columns combined with their cardinality meant that a lot of work was being done on the server.

XLCubed’s DAX generator was trying to cross-join all the values from each column, which had worked well for our other customers. But when there are a dozen columns including the transaction ID things do not go so well. DAX in itself is not a magic bullet and SSAS Tabular models can hit performance problems on low level data – we needed a new approach.

After some investigation we discussed the issue and our thinking with our friends at SQLBI and determined that instead of cross-join we wanted an option to use Summarize() instead as this only uses the rows in the database, and it can access columns related to the summarized table which were required for the report.

As the customer’s report had the transaction ID in it the result wasn’t aggregated, even though we were using summarize. But we wanted to add true transactional reporting too, using the Related() function.

Finally, SQL 2016 adds a couple of new functions, SummarizeColumns() and SelectColumns(), both of which are useful for this type of reporting, but offer better performance than the older equivalents.

The end result in XLCubed is a new option for DAX tables to allow users to set the type of report they want to run, and some internal changes so that XLCubed will automatically use the most efficient DAX function where they are available.

A beta was sent to the business users and the results were fantastic. The report which had run for several minutes now completed in a few seconds, and 20 minutes was down to 15 seconds – we had some very happy users!

The changes will be in the next release of XLCubed so that all our customers can benefit from the improvements. It’s always nice when a customer request helps improve the product for everyone.

A sample of the syntax change is included below

Before:

 

EVALUATE
FILTER (
    ADDCOLUMNS (
        KEEPFILTERS (
            CROSSJOIN ( VALUES ( 'Customer'[Education] ), VALUES ( 'Product'[Color] ) )
        ),
        "Internet Total Units", 'Internet Sales'[Internet Total Units],
        "Internet Total Sales", 'Internet Sales'[Internet Total Sales]
    ),
    NOT ISBLANK ( [Internet Total Units] )
)
ORDER BY
    'Customer'[Education],
    'Product'[Color]

After:

 

EVALUATE
FILTER (
    ADDCOLUMNS (
        KEEPFILTERS (
            SUMMARIZE ( 'Internet Sales', 'Customer'[Education], 'Product'[Color] )
        ),
        "Internet Total Units", 'Internet Sales'[Internet Total Units],
        "Internet Total Sales", 'Internet Sales'[Internet Total Sales]
    ),
    NOT ISBLANK ( [Internet Total Units] ) || NOT ISBLANK ( [Internet Total Sales] )
)
ORDER BY
    'Customer'[Education],
    'Product'[Color]

Some Excel BI myths debunked – #5 Real-time data exploration

#5: Lack of Real-time data exploration

The argument is often made that Excel is too inflexible to answer spur of the moment questions quickly and effectively. The scenario given is that you’re in a meeting with your Excel workbook, and someone asks a related question not currently accounted for. How embarrassing to have to look at your back up folder of printed Excel workbooks… Really? It may escape the attention of some, but Excel is actually an electronic product too so as a first point you wouldn’t need to dust off an enormous binder of printed reports.

That aside, the overall argument has some merit in specific cases. If Excel is acting as both the datastore and the presentation tool you have a problem. If the data you need isn’t in the workbook, you’re bang out of luck.

There are two key requirements to address the issue in Excel. Firstly the data needs to be stored outside the workbook; in the case of XLCubed that’s in AS cubes or tabular models. This means when the data isn’t currently visible in the workbook it can still easily be queried and brought into play.

Secondly, while it’s a huge step having the data in a cube, that in itself isn’t enough. You need to be able to get it out quickly, easily and flexibly and to display it as information rather than just data. There are significant limitations with pivot tables when used to report on a cube and XLCubed addresses these while adding a lot more capability on top. The additional data you need to answer the question is readily available, and you have tools to do something meaningful with it using slice and dice tabular reporting, interactive charting and straightforward user calculations.

So when someone asks the question in meeting, you can explore it interactively and on the spot. And in Excel.

Some Excel BI myths debunked

“Excel: Great hammer, lousy screwdriver’”

When evaluating BI tools, many of our customers are hit by marketing messages about the limitations and woes of Excel. One white paper we were pointed to is Tableau’s ‘Excel: Great hammer, lousy screwdriver’. It contains 5 key points concerning Excel limitations for BI which we’ll take a look at over the next few weeks, along with a few others which we hear frequently.

“Don’t throw the baby out with the bathwater”

We fully appreciate that Excel isn’t perfect for all needs, but XLCubed chooses to address the weaknesses and embrace the very significant strengths, rather than throwing everything away.

XLCubed helps users get most value and benefit from Microsoft’s Analysis Services platform by taking the best of Excel, and extending that with an optimised query and reporting environment which lets them do more, and do it more quickly. Excel becomes a very flexible presentation layer, and Analysis Services removes the scalability and data integrity issues.

Let’s take a look at some of the key Excel objections raised, with #2 to follow next week:

#1) Limited data volumes

“Excel only handles 1 million rows – that’s not nearly enough for my business”

The advent of Big data makes everyone think of huge data volumes. In reality if you’re looking at core Financial reporting a million rows may well be more than enough but that’s not the point: for sales and operational reporting over several years a million rows won’t come close.  Big Data is partly around volumes, but also concerns the data structure. These days the challenge of big data isn’t the ability to store it, it’s the ability to do something useful with it.  And doing something useful with it isn’t creating reports which run to a million rows.

We see Excel as a presentation layer, not as a database. While Power Pivot muddies that argument a little, very few people see Power Pivot models as a central repository for Corporate data. XLCubed is a client front end tool for SQL Server Analysis Services (which  laughs at 1 million rows). 1 million new rows per day over several years is starting to ramp up the volume, but the technology is designed to scale, and to scale on significantly less expensive hardware than in-memory technologies (of which Analysis Services 2012 of course now has its own player with xVelocity).

So while Excel and hence XLCubed can only display 1 million rows at a time, the underlying cubes can run to billions of rows. XLCubed gives the user flexible and fast filtering and ranking capabilities, simple ways to leverage the cube hierarchies, and effective data visualisation techniques to let you work with these large volumes of data.

Aside from that, if someone wants a report (a report!) which is a million rows long, our first question is always ‘and can you show me how you use that report?’.  If you print it you’ll get around 25,000 pages of deforestation. By comparison Tolstoy’s War and Peace is  around 1,400 pages in most print editions… We believe there is a lot to be said for a combination of top-down reporting, and ranking and filtering to make that type of data volume useful rather than burdensome.

So in summary, when you’re using XLCubed and the Microsoft BI stack, more than a million rows of data is really not a limitation (though if you put a million rows in the report you’re creating your own limitation in terms of its usefulness).

XLCubed as an alternative to ProClarity

With the launch of 7.1 of XLCubed Excel Edition we introduced the ability to import ProClarity Briefing Books – with support for ProClarity ending this year and many customers looking for a replacement, now is a great time for us to show you how the import works to help move users from ProClarity to an alternate solution.

Importing

Let’s start with importing from ProClarity, we’ve built a simple example briefing book based on the usual AdventureWorks sample cube, it includes a sample grid:

 

a performance map:

 

and a chart:

 

To get to the import option we load Excel and select XLCubed -> Extras -> Import -> Import ProClarity Briefing book. After selecting the file to import we are given a summary of each item that is going to be imported:

 

At this point you can control the resulting worksheet name, as well as switching the type of XLCubed object you’ll end up with. Clicking “Import” will now give us 1 sheet for each briefing page:

 

You’ll notice that the import process has created any required slicers so the report is good to go. You could now also spend a bit more time adding any extra XLCubed functionality to the report such as Incell charts or Excel calculations to leverage the power of Excel or publish to XLCubedWeb for consumption by a wider audience.

The import process is very straight forward and we have some great feedback from our customers regarding the speed and ease that they have been able to migrate users’ reports into XLCubed.

Look out for some more blogs showing other features of XLCubed that will help users transition from ProClarity!

Easy pivoting of SQL queries in Excel

So today’s blog is all about pivoting SQL query data columns.

Here we have a small sample of a SQL query report that shows us actual revenue across different products over a number of quarters.  There’s nothing wrong with the data being returned but it is pretty difficult to do any comparison analysis.

So what if your task is to report back on actual revenue across the product categories over all the quarters in this report.

This would not be an easy query to write in SQL as we don’t always know which data will be returned by a query, but that’s where  XLCubed can come to our rescue!

We just right-click on the column heading which we wish to pivot – in our case cDate and then select XLCubed and Pivot (XLCubed fills in the column heading) as below:

 

The report is now displayed as with the quarters across the page as columns.

 

This format is so much easier to read and we can quickly how each of the products are contributing (or not!) to the company’s revenue across a time period.

So that’s how easy it is to pivot column data in XLCubed.

XLCubed V7 & SQL Server 2012

SQL server 2012 has recently been released to manufacturing, and at XLCubed we’re well placed to take advantage of everything that is new in 2012.

SQL 2012 delivers Business Intelligence under the ‘BISM’ umbrella (Business Intelligence Semantic Model). BISM comes in different flavours though:

  • BISM Multi-dimensional
    • (Latest version of Analysis Services as we know it)
  • BISM Tabular
    • In-Memory Vertipaq
    • Direct Query

For client tools, BISM Multi-dimensional is largely the same as connecting to existing versions of Analysis Services, with MDX being the query language. For XLCubed we can leverage what we already have in that respect, and the transition is seamless.

BISM tabular is different though. If you choose to deploy in-memory to Vertipaq, client tools can still use MDX, and as such don’t need significant change, other than to handle the tabular rather than hierarchical data environment. However if the deployment is Direct Query (for example for real-time BI), the only available query language is DAX.

There are best use cases for the different deployment options, but it’s fair to say there is a degree of confusion in the space at the moment about the relative merits of each. We’ll try to shed some light and guidance here over the next weeks and months. As a product though, it’s important for us to support and extend the full range of 2012 BI deployment options, and to make these available and accessible to our customers. That’s exactly what we’ve done for version 7.

XLCubed v7, which releases next month, is a client for both MDX and DAX, and as such provides one consistent client interface in Excel and on the Web which can access any of the SQL 2012 deployment models for BI. We are also adding a much richer relational SQL reporting environment.

We are really pleased with some of the beta feedback we’ve had to date, and if you’d like to trial the beta version contact us at beta@xlcubed.com .

We’re looking forward to releasing the product next month, and will be previewing it at SQL Server Connections next week in Vegas.

 

Sql Server “Denali” CTP3 – first impressions…

Microsoft recently released their third CTP of Denali the upcoming SQL Server release (SQL Server 2011), so here are some initial thoughts now it’s more widely available.:

The first thing to look at is the new Tabular mode for Analysis Services (as opposed to the traditional multi-dimensional mode, which is still available). This is the server version of the VertiPaq engine first seen in the PowerPivot add-in, and moves the engine from being a personal/team tool to an organisation/enterprise level affair.

This means IT are going to get involved (and people can disagree about how they feel about that!), but that report sharing should be easier as data is held centrally. In the past the report contained all the data, which could make for very large workbooks, or you published to SharePoint, which not everyone was set up to do.

Cubes can be queried using MDX, which is great for a front-end vendor like us, and XLCubed works out of the box against the CTP. Existing functionality is working smoothly, and as Microsoft Gold Partners we’re working closely with the releases to utilise all the functionality for the RTM.

We have ported a few existing cubes to the new architecture and one first impression is that removing columns or using perspectives is going to be needed to keep things sensible for end-users, you can quickly end up with hundreds of attributes.

The ability to create hierarchies was something that was often asked for in PowerPivot, and thankfully that’s there now. This should simplify many cubes.

Attribute-tastic

 

The intricacies of MDX put most business users off trying to use it directly, whereas DAX’s similarities with Excel functions means there is more scope to have users create formulae on the fly. Examining how best to expose that to users is something we’ll be spending some time on in the coming months..

Easier distinct counts and the built in date calculations are the obvious candidates, but there are a number of others which we feel we can make more accessible for the majority of users.

It’s certainly an interesting move, and thinking in Tables and Columns instead of the Multidimensional model takes some getting used to, conversely for some people its more natural.

It’ll also be interesting to see how MDX and DAX are integrated. The Tabular server supports both languages for query. Currently using MDX you can use the “With Member” syntax to create members sent to the Tabular server, could you declare a DAX calculation in a similar manner?

Parent-Child Dimensions in Analysis Services – Performance Walkthrough

Parent-child hierarchies are a good fit for many data structures such as accounts or employees, and while they can speed development in some cases, they can also cause performance problems in large cubes.

We often see customers with these type of performance issues, and thought it worth sharing a simple technique for altering the dimension structure to improve query speed.

The problem

Often parent-child hierarchies are created as this is the structure used in the relational source, so they seem a good fit to model the members. In many cases though data is only at the leaf level of the hierachy, meaning parent-child isn’t really needed.

Performance problems occur because no aggregates are created for parent-child dimensions, as detailed in the Analysis Services performance guide:

Parent-child hierarchies

Parent-child hierarchies are hierarchies with a variable number of levels, as determined by a recursive relationship between a child attribute and a parent attribute. Parent-child hierarchies are typically used to represent a financial chart of accounts or an organizational chart. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled. As such, refrain from using parent-child hierarchies that contain large numbers of members at intermediate levels of the hierarchy. Additionally, you should limit the number of parent-child hierarchies in your cube.

If you are in a design scenario with a large parent-child hierarchy (greater than 250,000 members), you may want to consider altering the source schema to re-organize part or all of the hierarchy into a user hierarchy with a fixed number of levels. Once the data has been reorganized into the user hierarchy, you can use the Hide Member If property of each level to hide the redundant or missing members.

 

The performance guide hints at re-organizing the hierarchy to improve perfomance, but doesn’t say how.

The solution

This article will walkthrough the steps needed to change your parent-child hierarchy structure to have real levels, so that aggregations work, and your performance is as good as you expect with normal hierarchies.

This process is known as flattening or normalizing the parent-child hierarchy.

Firstly, let’s look at the data in our relational source.

[spoiler intro=”Code” title=”Sql Create Script”]

CREATE TABLE [dbo].[Products](
 [ProductID] [int] NOT NULL,
 [ParentID] [int] NULL,
 [Name] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED ([ProductID] ASC)
)

GO

insert into Products(ProductID, ParentID, Name) values(1, NULL, 'All')
insert into Products(ProductID, ParentID, Name) values(2, 1, 'Fruit')
insert into Products(ProductID, ParentID, Name) values(3, 2, 'Red')
insert into Products(ProductID, ParentID, Name) values(4, 3, 'Cherry')
insert into Products(ProductID, ParentID, Name) values(5, 3, 'Strawberry')
insert into Products(ProductID, ParentID, Name) values(6, 2, 'Yellow')
insert into Products(ProductID, ParentID, Name) values(7, 6, 'Banana')
insert into Products(ProductID, ParentID, Name) values(8, 6, 'Lemon')
insert into Products(ProductID, ParentID, Name) values(9, 1, 'Meat')
insert into Products(ProductID, ParentID, Name) values(10, 9, 'Beef')
insert into Products(ProductID, ParentID, Name) values(11, 9, 'Pork')

[/spoiler]

Not a large dimension, but enough to demonstrate the technique. As you can see my real products are all at the leaf level.

The strategy is quite simple:

  • Create a view to seperate the members into different levels.
  • Create a new dimension using these real levels.
  • Configure the dimension to appear like the original parent-child dimension, but with the performance of a normal dimension.

Create the view

We want to create a denormalised view of the data. To do this we join the Product to itself once for each level. This does mean we need to know the maximum depth of the hierarchy, but often this is fixed, and we’ll build in some extra levels for safety.

The tricks here are:

  • Use coalesce() so that we always get the lowest level ID below the leaves, never a NULL. This allows us to join to the fact table at the bottom level of our hierarchy.
  • Leave Name columns null below the leaves, this will allow us to stop the hierarchy at the correct leaf level in each part of the hierarchy.

[spoiler intro=”Code” title=”Sql View Script”]

create view dbo.ProductsFlattened

as

select    P1.ProductID as Level1ID,
 P1.Name as Level1Name,
 coalesce(P2.ProductID, P1.ProductID) as Level2ID,
 P2.Name as Level2Name,
 coalesce(P3.ProductID, P2.ProductID, P1.ProductID) as Level3ID,
 P3.Name as Level3Name,
 coalesce(P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level4ID,
 P4.Name as Level4Name,
 coalesce(P5.ProductID, P4.ProductID, P3.ProductID, P2.ProductID, P1.ProductID) as Level5ID,
 P5.Name as Level5Name

from    dbo.Products P0
left join    dbo.Products P1
 on        P0.ProductID = P1.ParentID
left join    dbo.Products P2
 on        P1.ProductID = P2.ParentID
left join    dbo.Products P3
 on        P2.ProductID = P3.ParentID
left join    dbo.Products P4
 on        P3.ProductID = P4.ParentID
left join    dbo.Products P5
 on        P4.ProductID = P5.ParentID

where P0.ParentID is null

[/spoiler]

Running this we get:

Obviously we can update this view to create more levels as required, but 5 are enough for now.

The Dimension

Next we go to BIDS, and add the view to our Data Source View, and then add a new Dimension based on the view.

The key steps to creating the dimension correctly are:

  • Set the key attribute to Level5ID, and the name to Level5Name.
  • Create an attribute for each Level ID, and on each set the Name Column appropriately.
  • Create a hierarchy using these attributes in order.
  • On each attribute set AttributeHierarchyVisible to False.
  • On each level of the hierarchy set HideMemberIf to NoName.
  • Set up the Attribute Relationships between the levels.

You should end up with the following:

Dimension Structure

 

Attribute Relationships

 

If you browse the dimension you’ll see that it never goes as far as level 5, even though it exists. This is becuase we set up the member hiding option, and returned NULLs in our view.

Conclusion

And that’s it done, you can now join to your fact tables at the lowest level, build your cube as normal and get the performance benefits of aggregation!

See also

A tool to achieve the same result is available from Codeplex, we’ve not personally tried it but may well be a timesaver. This works in a similar way to the example above, but it’s often useful to understand how something works, even if you choose to automate it.