As "An Excel User in a Cubed Kingdom" I’m starting my exploration of this new found land with a simple question: what is an OLAP cube? In plain English, please…
I like this simple, non-technical definition: a cube is a set of predefined answers. It’s up to you to select the right questions.
OK, let’s detail this.
Imagine that you have a very large database with the usual business data: orders, customers, sales representatives… Now you want to know how much a customer category ordered over the last year. You query the database and you get the answer. Then you want monthly sales. Query it again. Dig a little deeper to see what products that category ordered. Query it once more.
What is happening behind the curtain? Each time you enter a new query the system looks at each transaction (or a subset) and performs the necessary calculations to answer your query. You’ll get your answers, but it will be painfully slow: depending on your query and the database size, it may take hours. It is not an option.
But you don’t really need to see each individual order, do you? If you only need to know monthly sales, why should your system go through each transaction? If you pre-aggregate that data, you’ll get your answers much, much, faster, because there aren’t five million records, just 100.000. You’ll be able to actually work, instead of staring at your monitor, waiting for an answer.
This is what a cube does. It provides faster answers by eliminating the unnecessary detail for the task at hand. You shouldn’t look at a cube as an unique, condensed version of the database. While you have a virtually infinite number of questions that the database can answer, a cube focus on providing answers to a small set of questions. That’s why you can have different cubes (marketing, fin, sales), all of them getting data from the same source. They all answer different sets of questions.
When designing a cube (it’s your job, not IT’s), resist the temptation of a one-size-fits-all cube. Clearly define a coherent set of questions for your fundamental business needs and make sure they are answered once the cube becomes available.
If you are exploring your data you will not want to wait one hour each time you make a change. On the other end, a fast cube with no data to explore is useless. There is a fine balance between maximum flexibility and maximum performance.
An OLAP cube not only ensures that you retrieve the right data from the database but also allows you to explore it efficiently. Two good reasons to add OLAP cubes to your toolbox.
Next time we’ll see how plain English can describe the structure of a basic OLAP cube.
7 Replies to “So, what’s an OLAP Cube, anyway?”
I started this same journey 5 years ago and I was confused because I was coming from the world of databases and datasets that are suppose to be efficient by avoiding as much as possible redundancy and being design to be easily updateable. So from the prospective of someone that was trying to understand OLAP and hit some rocks I suggest you make sure to mention: OLAP is more efficient in analysis speed (hence the name) but in terms of data redundancy there are lots of data repetition. On the other hand its brother OLTP is efficient in terms of data entry and avoiding redundancy (is meant for transactions) but as you mentioned, is not optimized for analysis. I mention this because you said “condensed version of dataset” and sometimes, when exploring the underlying dataset of a cube you see a lot of repetition, which is counterintuitive.
I disagree with your comment regarding real-time queries on large datasets “it is not an option”. Cubes might have been an answer back in the 70’s where clien’ts hadn’t the processing power of today.
They also lack flexibility to modify/update (just ask anyone running SAP BW). Taking a step back, I ask the question: Do we need today to pre-define the answers in the OLAP fashion?
While today it is true that most enterprise-class offerings use OLAP cubes, it’s also know of the overhead that comes with them and why they typically turn into an IT project.
Excluding the fact that there is already viable technology WITHOUT OLAP cubes such as Qliktech / Pivot link (amongst others see Gartnter / AMR Research) paints an incomplete picture, in my opinion.
I believe that focusing less on the technologies (MOLAP/ROLAP/HOLAP…. and so on) and discussin HOW to enable collaboration between IT and the business, the first to provide infrastructure and access to business-readable data, and the later becoming capable on creating its own reports would be a much more interesting and applicable topic.
@Alvaro: the need for redundancy is something that I am aware of, but it is not easy to make the mental switch from my Excel background…
@Jorge: I understand your point. I believe that what XLCubed does is to put OLAP technology in the hands of the user, removing IT from the equation (they still need to provide and manage infrastructure), but I’m still learning and I need to see how everything fits together. Andreas will have a much more insightful comment on this.
You are right mentioning that there are already good in memory (OLAP like) technologies available like Qliktech, PALO, TM1 or the good old PivotTable which one to use, well, it all depends on the user requirements and how deep (if at all) IT is involved in the project.
SAP BW is a problem, because changing structures in SAP BW is hard, and also response times of SAP BW cubes are slow. However, this is not true for OLAP in general, in simple PivotTable / spreadmart scenario a Analysis Services local cubes also would do, its easy to build with the Excel build in wizard, and brings you all the advantages and flexibility of a real OLAP / MDX enabled cube. I will post the next week or so about local Analysis Services cubes, OLAP, MDX and XLCubed.
Another interesting in memory technology will be Gemini the upcoming Analysis Services release…
Gemini is all about the collaboration between IT and the departmental end users. As Nigel Pendse puts it Micosoft uses Gemini as a Trojan horse to bring for Analysis Services into the world of Excel power users:
“Gemini world is refreshingly free of the off-putting jargon like star/snowflake schemas, fact tables, cubes, measures, dimensions, hierarchies, levels, attributes, aggregations, partitions, MDX calculations and scripts typically encountered in OLAP server deployments. Instead, Excel power users with Gemini installed will be able to analyze and summarize vast amounts of data with absolutely no need to pre-define models or structures.”
I enjoyed your reply. And you took the next step bringing in Gemini. I’d like to know your thoughts about how XLCubed would position strategically once Gemini is available.
Hi This is an awesome explanation. I have bits and pieces knowledge on this. But now i am able to confirm on certain topics. I should say i got lot of clarity after going through this article.