I have started preparing for Microsoft Certifications on SQL Server 2012 and I am spending a fair amount of time with SSAS Tabular since it’s the newest addition to MS BI stack and the one which I am not very accoustomed to. As I work with it more and more, I am starting to think about how would you go around designing the database on which the tabular model will be based. In the multi-dimensional world, this was an easy answer. You would almost always design the DW in Star/Snow-flake schema or atleast expose the DW as Star/Snow-flake schema through SQL Server Views or Data Source View (DSV). In SSAS Tabular, it appears to me, that this not really necessary.
Do I really need to segregate data in facts and dimension?
I can create measure out of any column in any table in SSAS Tabular. It doesn’t have to be from a fact table so why separate out facts and dimensions? Assume that I have a ‘House’ dimension with ‘NumberOfRooms’ as attribute. In SSAS Multidimensional, if the users wanted to slice and dice ‘NumberOfRooms’ by other attributes of house dimension or any other dimension, you would have to some how surface this as a fact table. SSAS Tabular will happily allow you to create measure out of this column although it’s in a dimension table. In theory, you can point the tabular model to the normalized OLTP database and still be able to give users the ability to slice and dice the data just like SSAS Multidimensional.
Normalized Or Denormalized, that is the question..
There would be obvious issues with building the tabular model on a normalized schema.One, since the database is normalized, the sheer number of tables will be very large. Two, maintaining relationships between the tables would be painful.Three, the model itself can be confusing for end users. On the other hand, building a fully denormalized schema when you don’t really need it adds complexity to the project, especially in the ETL. I don’t think I need to reiterate that the complex part of DW project is ETL and around 60-70% effort are spent on this phase. So it appears like we need some middle ground.
Enter Entity Based design..
I have always perceived the DW as a collection of entities like product, sales, employee etc. which have relationship with each other. The star schema is similar but the dimensions don’t have relationship with each other directly. They are related to each other through the fact table. Plus, the dimensions are denormalized so as much as possible information is crammed into them. I guess what I am trying to say is we don’t need a fully denormalized structure and at the same time we don’t want the datawarehouse in 3NF. We would need to find a middle ground so that it is just about corretly denormalized and normalized at the same time. In my opinion, the way to get to this is to think of DW in terms of entities and how they interact with each other rather than classifying the data into dimension and facts.
Of course, these are all my prilimanary thoughts and are very much open to issues. I guess we would have to wait for wide spread adoption of the tabular model and learn from people who did it.