SSAS Multidimensional

Faster SSAS Processing by using Shared Memory Protocol

I came across this very handy tip to increase SSAS processing speed in SQL Server 2008 R2 Analysis Services Operations Guide and thought it is worth sharing.
The guide recommendes using Shared Memory Protocol when getting data from the relational data source if it is SQL Server and both SSAS and SQL Server are on the same physical box. Exchaning data over shared memory is much faster than over TCP/IP as you probably already know. You will need to perform two steps to force SSAS data source to use shared memory while querying underlying SQL Server.
1. Check that Shared Memory Protocol is enabled in SQL Server configuration manager.
2. Prefix the data source connection string in SSAS with :lpc like below.

Provider=SQLNCLI10.1;Data Source=lpc:ThisServer\INST1;Integrated Security=SSPI;Initial Catalog=WordMatch

The guide claims to have achieved 112,000 rows per second using TCP/IP where as 180,000 rows per second using shared memory which is impressive. In my own test, a slightly modified Adventure Works cube took 56 seconds to process using TCP-IP whereas 47 seconds using shared memory; an improvement of 16%.

Calculated distinct count measures in SSAS

Distinct count measures are fact-of-life for an SSAS developer. No matter how much we try to avoid them they are always asked for and we have to deal with them somehow. Another painful fact is, as you might already know, we cannot create multiple distinct count measures in the same measure group. So each distinct count measure has to sit in its own measure group which,IMO, does not look right when browsing the cube. In this post, I want to show a method of creating distinct count calculated measures which I found on Richard Lees blog here with slight modification.
http://richardlees.blogspot.co.uk/2008/10/alternative-to-physical-distinct-count.html

Using Adventure Works, let’s say the end users want to know the distinct count of customers who have placed orders on the internet. I can add a calculation like this in the cube

CREATE MEMBER CURRENTCUBE.[Measures].[Unique Customers]
AS COUNT(NONEMPTY([Customer].[Customer].[Customer].members,
[Measures].[Internet Order Count])),
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Internet Orders' ;

This is all fine and dandy, however, as soon as I added any attribute from customer dimension on the rows or filters, the results were showing incorrect values i.e. the same count of customers was repeated for all records.

The solution is to count the number of unique customers in the context of current attributes of customer dimension. For examples sake, lets take Customer City attribute. I tweaked the calculation like below to count customers only in the context of current members in City attributes and it started working as expected when Customer City attribute is used in rows, columns or filters.

CREATE MEMBER CURRENTCUBE.[Measures].[Unique Customers]
AS COUNT(NONEMPTY(
CROSSJOIN(
[Customer].[Customer].[Customer].members,
[Customer].[City].CurrentMember
),[Measures].[Internet Order Count])),
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Internet Orders' ;

Of course, you will have to add all the dimension attributes in the CROSSJOIN but ultimately a calculated, though complex, distinct count measure is better than having a number of physical distinct count measures IMHO.

 

Displaying last cube processed time in SSAS

Recently, while browsing a cube excel, an end-user asked me when was it last processed? I had no quick answer and so had to connect to the SSAS server in management studio and check the properties of the cube. This got me thinking on how to expose the cube’s last processed date and time to end users through Excel or similar client tools. This is very helpful to them in that they can quickly check when was the data last added. Think about it like this. A user builds a pivot table report in Excel with a filter at top which shows when was the cube processed. He then copies the pivot table data and sends to his colleague. His colleague checks the numbers and instantly knows that they are based on data from certain date. If he did not have the last processed date, the assumption can be made that the numbers are based on latest set of data which might not be true. So with that in mind, I decided to implement a simple solution to expose when the cube was last processed and here are the steps.

1. Create table in data warehouse which would have just one column storing date and time when the cube processing finished. This table will always have one record.

2. The record in the table above will be updated with latest timestamp after the cube has been processed. This can be done in the ETL.

3. Import this table in the DSV and create a dimension based on it. This dimension will have just one attribute and one member which will the the last time cube was processed. You might want to be careful with naming of this dimension. It should appear either at the top or bottom in fields list in pivot table so that it does not interfere with the analysis.

4. Set IsAggregatable property of the attribute to False. This is to remove the ‘All’ member because there’s always going to be one member in the dimension attributes,

5. Add this dimension in Dimension Usage matrix. You don’t have to worry about attaching it to any measure group.

6. Add the new dimension to the perspectives if you have any and that’s it.Deploy and process the cube.

To test, connect to the cube in Excel, drag the attribute in report filter and it should show when was the cube last processed. Here’s a little screenshot when I implemented this for Adventure Works 2012 cube. And yes, don’t forget to process the dimension while processing the SSAS database.

CubeLastProcessedDateTime

Filter shows when the cube was last processed

Hope that was helpful. Any suggestions are always welcome.

Large date dimensions in SSAS

I would like to share two simple tips on date dimension if the date range in your date dimension is large and uncertain.

A brief background

I was recently working on a project where I was faced with two situations.

1. The range of dates in the date dimension was large. Something from 1970’s to 2030 and beyond.

2. The source data can contain dates outside of this range i.e. before 1970 and after 2030 but we did not know for certain the exact range.

So here are the tips.

1. Add only required dates 

I could have just populated the date dimension with 200 years worth of dates i.e. 73000 records and be done with it. On closer inspection of source data, however, I found that data in fact table will be sparse for years outside of this range. There would be a few records which would have a date from 1955, a few from 1921 and so on. So why add those extra rows for year 1920 if the only date that is ever going to be used from this year is 01/02/1920.  Even for future dates, why bother adding all the dates from 2070 if ,now, the only date I need is, lets say, 23/09/2070.

To avoid fattening the date dimension, I created a static date range i.e. dates which are most often used. For dates outside of these I created a so-called ‘date sync’ mechanism. In a nutshell, all it does is at the end of dimension load and before the beginning of fact load, it goes through all the date fields in source tables (which are in staging or ODS by now) and makes sure that all dates are present in the date dimension. If they are not, it simply created a row for that particular day. It might seem a slow process but since the data is in relational engine by now, it is quite fast. Plus, it always makes sure that the date will always present in the date dimension so ETL won’t fail due to foreign key constraints.

2. Categories dates

So as mentioned before, our date range was wide so slicing and dicing using date hierarchy was painful because the year started from 1920′ till 2030 and beyond. To make browsing a little less problematic, we introduced a year category field. When we asked the business users, they were most interested in data from last 5 years to next 10 years. So we added a derived column which categorized the dates into various buckets like Pre-2008, 2008… and Post-2024.  We created an attribute based on this fields in date dimension and our date hierarchy looked like this.

DateCategory–>Year–>Month–>Date

Now, when the users dragged date hierarchy on rows in excel, they would see years before 2008 under Pre-2008 then all the years between 2008  and 2024 (which they were most interested in ) and then Post-2024. Nice and clean.

Hopefully these will be helpful to you in some way or might give you some better idea of handling large date dimension. If you have any suggestions, please feel free to drop me a line.

Hide a specific attribute hierarchy in role playing dimension – SSAS multidimensional

Role playing dimension is a dimension which plays different roles in a fact table. In SSAS multidimensional, a role playing dimension can be used across different cubes and measure groups. Date dimension is the most common example. A date might be appear in various forms such as SaleDate, ReceivedDate, OrderDate, ShipDate etc. Creating a separate dimension for each of these would not be a good approach. Instead, a single date dimension is created and is attached to various measure groups using different keys. In fact, if you have defined a relationship between multiple fact table columns and same dimension key, SSAS multidimensional will add the role playing dimensions automatically as shown below.

AdventureWorksRolePlayingDimension

AdventureWorksRolePlayingDimension

In certain situations, however, not all the attributes of a role playing dimension are relevant to all measure groups or cubes. Let us take an example of Organisation dimension. This dimension has four attribute hierarchies namely OrgnisationKey, Organisation Name, Customer Organisation Reference, and Provider Organisation Reference. Let us also assume a fictional fact table with columns CustomerOrgnisationKey , ProviderOrganisationKey and TransactionAmount. This fact table references Organisation dimension twice i.e. CustomerOrgnisationKey –>  OrgnisationKey and ProviderOrganisationKey –> OrgnisationKey. This means Organisation dimension is a role playing dimension. The table relationship and role playing in SSAS is shown below.

Tabler Relationships and Role Playing

Tabler Relationships and Role Playing

Now, when browsing the Customer Organisation dimension, the hierarchy Provider Organisation Reference doesn’t make much sense. At the same time, when browsing Provider Organisation dimension Provider Organisation Reference doesn’t convey any meaning. There is no apparent way of hiding the non-relevant hierarchy in each dimension. If you hide a hierarchy in the parent Organisation dimension, it would hide it in both the role playing dimensions. This is where I was stuck. As mentioned in one of posts, there are two places where ‘AttributeHierarchyVisible’ property can be set. For role playing dimension, you can do that for each individual role playing dimension from Cube structure tab. So go to Cube Structure tab and you should see Dimensions panel below Measures panel. This will show each dimension and it’s attributes separately. Now you can hide or even enable and disable attribute hierarchies for each role playing dimension. A picture is worth a thousand words, so here’s a screenshot.

Hiding Attribute Hierarchy in Role Playing Dimension

Hiding Attribute Hierarchy in Role Playing Dimension

Something’s to look for when you are building your next SSAS cube

There are a number of things which you should be aware of when you are building your SSAS cube. Here I am listing some of the very trivial things that have been particularly painful to me. I am hoping that this will be a live post and I will add more as and when I find them.

 1. Create views in the DW and then build your DSV from them

The biggest benefit is that it will add an abstraction layer between DW and SSAS.  Metadata columns from the DW such as StartDate, EndDate, Source, DateAdded, ModifiedBy etc. which are not required in the cube can be easily filtered out. Another benefit is that any changes in the DW which you don’t want to propagate to cube can be absorbed in views. For example minor data type changes, adding new column, new relationships etc. Make sure that the views are simple SELECT statements over DW tables with minimal joins.

2. Avoid NULLs

This is more of a DW thing than SSAS. SSAS behaves slightly weird when it finds a NULL, especially when the column has NULL and an empty string(”). SSAS converts NULLs to blanks or zeros and chances are that you will get Duplicate Key Error. You can avoid NULLs in the views as mentioned above. If you do have NULLs then consider setting  NullProcessing property toUnknownMember.

3. Religiously define and maintain relationships between tables in DSV

This is very important since SSAS will be able to help you more actively once you define relationships. Take snow-flaking for example,when building snow-flaked dimension, SSAS can tell you which other tables from DSV are related to it to help you out.

4. Beware of the datatypes and trimming

I have had a fair share of trouble because of these. If the datatype of a column changes in DSV, this change does not reflect automatically in dimension attribute. This is specially troublesome for varchar,char or nvarchar datatypes because sooner or later you will start getting  the error “The size specified for a binding was too small, resulting in one or more column values being truncated.”  Dimension Data Type Discrepancy Check  from BIDS Helper is a life saver in this situation

5. Use folders to categories dimension attributes

For dimensions with large number of attributes, use AttributeHierarchyDisplayFolder property to display them in specific folders. This will make browsing the cube in excel for end users simple. Same holds true if you have large number of measures in a measure group.

6. Keep the attributes in the dimension to bare minimum

Too many attributes in too many dimensions will create too much confusion. Give users what they absolutely need and will be most useful to them. Disable or hide the hierarchies which are not needed.

7. Create partitions or atleast plan for them (Update:03 August 2012)

This may not be an immediate concern but thinking about partitioning in the beginning will help you in the longer run. Partitioning can dramatically reduce the cube processing and performance.

8. Partition slices are important, don’t ignore them!! (Update: 19 December 2012)

Without slices, chances are that your MDX query is scanning all the partitions. There are many many blogs out there who confess to this and recently I had a very similar experience. This post from Mosha is a good indication of importance of setting Slices.