Month: April 2013

Building data model for SSAS Tabular

I have started preparing for Microsoft Certifications on SQL Server 2012 and I am spending a fair amount of time with SSAS Tabular since it’s the newest addition to MS BI stack and the one which I am not very accoustomed to. As I work with it more and more, I am starting to think about how would you go around designing the database on which the tabular model will be based. In the multi-dimensional world, this was an easy answer. You would almost always design the DW in Star/Snow-flake schema or atleast expose the DW as Star/Snow-flake schema through SQL Server Views or Data Source View (DSV). In SSAS Tabular, it appears to me, that this not really necessary.

Do I really need to segregate data in facts and dimension?
I can create measure out of any column in any table in SSAS Tabular. It doesn’t have to be from a fact table so why separate out facts and dimensions? Assume that I have a ‘House’ dimension with ‘NumberOfRooms’ as attribute. In SSAS Multidimensional, if the users wanted to slice and dice ‘NumberOfRooms’ by other attributes of house dimension or any other dimension, you would have to some how surface this as a fact table. SSAS Tabular will happily allow you to create measure out of this column although it’s in a dimension table. In theory, you can point the tabular model to the normalized OLTP database and still be able to give users the ability to slice and dice the data just like SSAS Multidimensional.

Normalized Or Denormalized, that is the question..
There would be obvious issues with building the tabular model on a normalized schema.One, since the database is normalized, the sheer number of tables will be very large. Two, maintaining relationships between the tables would be painful.Three, the model itself can be confusing for end users. On the other hand, building a fully denormalized schema when you don’t really need it adds complexity to the project, especially in the ETL. I don’t think I need to reiterate that the complex part of DW project is ETL and around 60-70% effort are spent on this phase. So it appears like we need some middle ground.

Enter Entity Based design..
I have always perceived the DW as a collection of entities like product, sales, employee etc.  which have relationship with each other. The star schema is similar but the dimensions don’t have relationship with each other directly. They are related to each other through the fact table. Plus, the dimensions are denormalized so as much as possible information is crammed into them. I guess what I am trying to say is we don’t need a fully denormalized structure and at the same time we don’t want the datawarehouse in 3NF. We would need to find a middle ground so that it is just about corretly denormalized and normalized at the same time. In my opinion, the way to get to this is to think of DW in terms of entities and how they interact with each other rather than classifying the data into dimension and facts.

Of course, these are all my prilimanary thoughts and are very much open to issues. I guess we would have to wait for wide spread adoption of the tabular model and learn from people who did it.

Advertisements

SSAS on Windows Azure Virtual Machines

Chris Webb's BI Blog

You may have already seen the announcement about Windows Azure Virtual Machines today; what isn’t immediately clear (thanks to Teo Lachev for the link) is that Analysis Services 2012 Multidimensional and Reporting Services are installed on the new SQL Server images. For more details, see:
http://msdn.microsoft.com/en-us/library/jj992719.aspx

SSAS 2012 Tabular is also supported but not initially installed.

View original post

GeoFlow Public Preview Available

Chris Webb's BI Blog

First big news from the PASS BA Conference: the public preview for GeoFlow is now available. You can download it here:
http://www.microsoft.com/en-us/download/details.aspx?id=38395

Here are the official announcements with all the details:
http://blogs.technet.com/b/dataplatforminsider/archive/2013/04/11/day-2-pass-business-analytics-conference-new-3d-mapping-analytics-tool-for-excel.aspx
http://blogs.office.com/b/microsoft-excel/archive/2013/04/11/dallas-utilities-electricity-seasonal-use-simulation-with-geoflow-preview-and-powerview.aspx

GeoFlow is an addin for Excel 2013 that allows you to visualise your data on a 3D map, to zoom in and explore that data, and record ‘tours’ of this data. It’s a lot of fun! As a taster, here’s a screenshot of a visualisation showing English secondary schools exam results data (average A-Level point score per pupil) broken down by school gender of entry:

image

UPDATE: one other thing I have to mention is that when this was announced in the keynote at the PASS BA Conference this morning, Amir Netz did an absolutely astounding demo showing GeoFlow’s touch-based capabilities running on a massive Perceptive Pixel screen (I think it was this one: http://www.perceptivepixel.com/products/82-lcd-multi-touch-display). It was…

View original post 51 more words

OLE DB data source in SSIS returning no records when using table variable

The other day I stumbled across a strange situation while working on a SSIS package. The package contained OLE DB datasource task using ‘SQL Command’ data access mode. This SQL command contained a table variable. My intention was to populate this table variable and then select data from it to be processed further in the package. After writing my query, I clicked preview and it nicely showed the rows I was looking for. However, when I executed the  package, no rows were returned. Bugger!!! The package wasn’t failing but it was not processing any records either. I checked the package configuration and ran the query in SSMS just to make sure the data is indeed there. Everything looked ok but still no records when the package is executed. Puzzled, I googled and came across the solution here. All credits to the original poster, I would recommend all to read it.

If you set SET NOCOUNT ON at the top of SQL Command, SQL Server will supress sending DONE_IN_PROC messages to the client. DONE_IN_PROC messages are  the messages you see in Messages Tab of query result window in SSMS. Setting SET NOCOUNT ON can also provide a significant performance boost as mentioned in the MSDN article here.

Intrigued, I decided to dig a little further. To begin, during package design, it seems that if there are multiple statements in SQL Command, OLE DB will pick up the metadata from the first SELECT statement. At package run time,I think, the first resultset is processed by OLE DB. When the SQL Command contained table variable and NOCOUNT is not set to ON, the DONE_IN_PROC messages are retruned as empty resultsets. OLE DB source takes the first resultset and proceeds further. Hence the package succeeds but no records are returned.

Of course, I might be completely wrong here but reading the OLE DB specifcations for solving a trivial issue doesn’t seem like a good idea. Any how, I atleast know the solution.