SSAS 2012 Tabular

Large date dimensions in SSAS

I would like to share two simple tips on date dimension if the date range in your date dimension is large and uncertain.

A brief background

I was recently working on a project where I was faced with two situations.

1. The range of dates in the date dimension was large. Something from 1970’s to 2030 and beyond.

2. The source data can contain dates outside of this range i.e. before 1970 and after 2030 but we did not know for certain the exact range.

So here are the tips.

1. Add only required dates 

I could have just populated the date dimension with 200 years worth of dates i.e. 73000 records and be done with it. On closer inspection of source data, however, I found that data in fact table will be sparse for years outside of this range. There would be a few records which would have a date from 1955, a few from 1921 and so on. So why add those extra rows for year 1920 if the only date that is ever going to be used from this year is 01/02/1920.  Even for future dates, why bother adding all the dates from 2070 if ,now, the only date I need is, lets say, 23/09/2070.

To avoid fattening the date dimension, I created a static date range i.e. dates which are most often used. For dates outside of these I created a so-called ‘date sync’ mechanism. In a nutshell, all it does is at the end of dimension load and before the beginning of fact load, it goes through all the date fields in source tables (which are in staging or ODS by now) and makes sure that all dates are present in the date dimension. If they are not, it simply created a row for that particular day. It might seem a slow process but since the data is in relational engine by now, it is quite fast. Plus, it always makes sure that the date will always present in the date dimension so ETL won’t fail due to foreign key constraints.

2. Categories dates

So as mentioned before, our date range was wide so slicing and dicing using date hierarchy was painful because the year started from 1920′ till 2030 and beyond. To make browsing a little less problematic, we introduced a year category field. When we asked the business users, they were most interested in data from last 5 years to next 10 years. So we added a derived column which categorized the dates into various buckets like Pre-2008, 2008… and Post-2024.  We created an attribute based on this fields in date dimension and our date hierarchy looked like this.

DateCategory–>Year–>Month–>Date

Now, when the users dragged date hierarchy on rows in excel, they would see years before 2008 under Pre-2008 then all the years between 2008  and 2024 (which they were most interested in ) and then Post-2024. Nice and clean.

Hopefully these will be helpful to you in some way or might give you some better idea of handling large date dimension. If you have any suggestions, please feel free to drop me a line.

Advertisements

Building data model for SSAS Tabular

I have started preparing for Microsoft Certifications on SQL Server 2012 and I am spending a fair amount of time with SSAS Tabular since it’s the newest addition to MS BI stack and the one which I am not very accoustomed to. As I work with it more and more, I am starting to think about how would you go around designing the database on which the tabular model will be based. In the multi-dimensional world, this was an easy answer. You would almost always design the DW in Star/Snow-flake schema or atleast expose the DW as Star/Snow-flake schema through SQL Server Views or Data Source View (DSV). In SSAS Tabular, it appears to me, that this not really necessary.

Do I really need to segregate data in facts and dimension?
I can create measure out of any column in any table in SSAS Tabular. It doesn’t have to be from a fact table so why separate out facts and dimensions? Assume that I have a ‘House’ dimension with ‘NumberOfRooms’ as attribute. In SSAS Multidimensional, if the users wanted to slice and dice ‘NumberOfRooms’ by other attributes of house dimension or any other dimension, you would have to some how surface this as a fact table. SSAS Tabular will happily allow you to create measure out of this column although it’s in a dimension table. In theory, you can point the tabular model to the normalized OLTP database and still be able to give users the ability to slice and dice the data just like SSAS Multidimensional.

Normalized Or Denormalized, that is the question..
There would be obvious issues with building the tabular model on a normalized schema.One, since the database is normalized, the sheer number of tables will be very large. Two, maintaining relationships between the tables would be painful.Three, the model itself can be confusing for end users. On the other hand, building a fully denormalized schema when you don’t really need it adds complexity to the project, especially in the ETL. I don’t think I need to reiterate that the complex part of DW project is ETL and around 60-70% effort are spent on this phase. So it appears like we need some middle ground.

Enter Entity Based design..
I have always perceived the DW as a collection of entities like product, sales, employee etc.  which have relationship with each other. The star schema is similar but the dimensions don’t have relationship with each other directly. They are related to each other through the fact table. Plus, the dimensions are denormalized so as much as possible information is crammed into them. I guess what I am trying to say is we don’t need a fully denormalized structure and at the same time we don’t want the datawarehouse in 3NF. We would need to find a middle ground so that it is just about corretly denormalized and normalized at the same time. In my opinion, the way to get to this is to think of DW in terms of entities and how they interact with each other rather than classifying the data into dimension and facts.

Of course, these are all my prilimanary thoughts and are very much open to issues. I guess we would have to wait for wide spread adoption of the tabular model and learn from people who did it.

Changing workspace database name in SSAS 2012 Tabular

A word of caution. Before you try to implement any steps mentioned in this entry, I would like to point out that it is not recommend. Moreover, I do not take any responsibility if this causes any issues in your project. 

A tabular model workspace database is created automatically when you create a new  SQL Server Tabular Project in SQL Server Data Tools (SSDT). By default it is created on local SSAS Tabular server but you can choose a different server as well. The workspace database name consist of the project name, user name ,GUID and frankly it looks quite gibberish.  For example look at the screenshot of my SSMS on my test virtual machine. Yikes!! Now imagine this on a server with 10 concurrent users, each with 10 projects and you will get the idea.

So I was wondering if I can somehow rename them. You know something pretty. And as it turn out, I can. So here are the steps.

1. Starts SSTD (obviously!!) and create a new Analysis Services Tabular Project. At this point I would also recommend that you rename the default ‘Model.bim’ to something more meaningful to your project.

2. Right click on bim file and click properties.

3. Apart from all the other properties, you will find Workspace Database property in the bottom portion. The description of the properties is here. If you open SSMS and connect to SSAS Tabular instance, you should find a database with the same name as this property value. As you would notice, Workspace Database cannot be changed. It is read-only and that’s the one we want to change.

4. Now right click on project ,click Open Folder in Windows Explorer and close SSDT.

5. In windows explorer, there would be *.settings file. The format of this file seems to be ModelName.bim_username.settings. It might be hidden so change the folder settings to show hidden files.

6Open this file in notepad. It is an xml although everything will come on one line. I would also recommend that you don’t try to format the file.

7. Now look for <DatabaseName> </DatabaseName> tag. The value between this tag is name of the database. Change this value to something more meaningful, save and close notepad.

8. Open the project/solution in SSDT. Check the properties of bim file. The value of property Workspace Database would be what you set in Step 7.

9. You can also open SSMS and verify this.

Neat, isn’t it??