ssas

Faster SSAS Processing by using Shared Memory Protocol

I came across this very handy tip to increase SSAS processing speed in SQL Server 2008 R2 Analysis Services Operations Guide and thought it is worth sharing.
The guide recommendes using Shared Memory Protocol when getting data from the relational data source if it is SQL Server and both SSAS and SQL Server are on the same physical box. Exchaning data over shared memory is much faster than over TCP/IP as you probably already know. You will need to perform two steps to force SSAS data source to use shared memory while querying underlying SQL Server.
1. Check that Shared Memory Protocol is enabled in SQL Server configuration manager.
2. Prefix the data source connection string in SSAS with :lpc like below.

Provider=SQLNCLI10.1;Data Source=lpc:ThisServer\INST1;Integrated Security=SSPI;Initial Catalog=WordMatch

The guide claims to have achieved 112,000 rows per second using TCP/IP where as 180,000 rows per second using shared memory which is impressive. In my own test, a slightly modified Adventure Works cube took 56 seconds to process using TCP-IP whereas 47 seconds using shared memory; an improvement of 16%.

Calculated distinct count measures in SSAS

Distinct count measures are fact-of-life for an SSAS developer. No matter how much we try to avoid them they are always asked for and we have to deal with them somehow. Another painful fact is, as you might already know, we cannot create multiple distinct count measures in the same measure group. So each distinct count measure has to sit in its own measure group which,IMO, does not look right when browsing the cube. In this post, I want to show a method of creating distinct count calculated measures which I found on Richard Lees blog here with slight modification.
http://richardlees.blogspot.co.uk/2008/10/alternative-to-physical-distinct-count.html

Using Adventure Works, let’s say the end users want to know the distinct count of customers who have placed orders on the internet. I can add a calculation like this in the cube

CREATE MEMBER CURRENTCUBE.[Measures].[Unique Customers]
AS COUNT(NONEMPTY([Customer].[Customer].[Customer].members,
[Measures].[Internet Order Count])),
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Internet Orders' ;

This is all fine and dandy, however, as soon as I added any attribute from customer dimension on the rows or filters, the results were showing incorrect values i.e. the same count of customers was repeated for all records.

The solution is to count the number of unique customers in the context of current attributes of customer dimension. For examples sake, lets take Customer City attribute. I tweaked the calculation like below to count customers only in the context of current members in City attributes and it started working as expected when Customer City attribute is used in rows, columns or filters.

CREATE MEMBER CURRENTCUBE.[Measures].[Unique Customers]
AS COUNT(NONEMPTY(
CROSSJOIN(
[Customer].[Customer].[Customer].members,
[Customer].[City].CurrentMember
),[Measures].[Internet Order Count])),
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Internet Orders' ;

Of course, you will have to add all the dimension attributes in the CROSSJOIN but ultimately a calculated, though complex, distinct count measure is better than having a number of physical distinct count measures IMHO.

 

Displaying last cube processed time in SSAS

Recently, while browsing a cube excel, an end-user asked me when was it last processed? I had no quick answer and so had to connect to the SSAS server in management studio and check the properties of the cube. This got me thinking on how to expose the cube’s last processed date and time to end users through Excel or similar client tools. This is very helpful to them in that they can quickly check when was the data last added. Think about it like this. A user builds a pivot table report in Excel with a filter at top which shows when was the cube processed. He then copies the pivot table data and sends to his colleague. His colleague checks the numbers and instantly knows that they are based on data from certain date. If he did not have the last processed date, the assumption can be made that the numbers are based on latest set of data which might not be true. So with that in mind, I decided to implement a simple solution to expose when the cube was last processed and here are the steps.

1. Create table in data warehouse which would have just one column storing date and time when the cube processing finished. This table will always have one record.

2. The record in the table above will be updated with latest timestamp after the cube has been processed. This can be done in the ETL.

3. Import this table in the DSV and create a dimension based on it. This dimension will have just one attribute and one member which will the the last time cube was processed. You might want to be careful with naming of this dimension. It should appear either at the top or bottom in fields list in pivot table so that it does not interfere with the analysis.

4. Set IsAggregatable property of the attribute to False. This is to remove the ‘All’ member because there’s always going to be one member in the dimension attributes,

5. Add this dimension in Dimension Usage matrix. You don’t have to worry about attaching it to any measure group.

6. Add the new dimension to the perspectives if you have any and that’s it.Deploy and process the cube.

To test, connect to the cube in Excel, drag the attribute in report filter and it should show when was the cube last processed. Here’s a little screenshot when I implemented this for Adventure Works 2012 cube. And yes, don’t forget to process the dimension while processing the SSAS database.

CubeLastProcessedDateTime

Filter shows when the cube was last processed

Hope that was helpful. Any suggestions are always welcome.

Large date dimensions in SSAS

I would like to share two simple tips on date dimension if the date range in your date dimension is large and uncertain.

A brief background

I was recently working on a project where I was faced with two situations.

1. The range of dates in the date dimension was large. Something from 1970’s to 2030 and beyond.

2. The source data can contain dates outside of this range i.e. before 1970 and after 2030 but we did not know for certain the exact range.

So here are the tips.

1. Add only required dates 

I could have just populated the date dimension with 200 years worth of dates i.e. 73000 records and be done with it. On closer inspection of source data, however, I found that data in fact table will be sparse for years outside of this range. There would be a few records which would have a date from 1955, a few from 1921 and so on. So why add those extra rows for year 1920 if the only date that is ever going to be used from this year is 01/02/1920.  Even for future dates, why bother adding all the dates from 2070 if ,now, the only date I need is, lets say, 23/09/2070.

To avoid fattening the date dimension, I created a static date range i.e. dates which are most often used. For dates outside of these I created a so-called ‘date sync’ mechanism. In a nutshell, all it does is at the end of dimension load and before the beginning of fact load, it goes through all the date fields in source tables (which are in staging or ODS by now) and makes sure that all dates are present in the date dimension. If they are not, it simply created a row for that particular day. It might seem a slow process but since the data is in relational engine by now, it is quite fast. Plus, it always makes sure that the date will always present in the date dimension so ETL won’t fail due to foreign key constraints.

2. Categories dates

So as mentioned before, our date range was wide so slicing and dicing using date hierarchy was painful because the year started from 1920′ till 2030 and beyond. To make browsing a little less problematic, we introduced a year category field. When we asked the business users, they were most interested in data from last 5 years to next 10 years. So we added a derived column which categorized the dates into various buckets like Pre-2008, 2008… and Post-2024.  We created an attribute based on this fields in date dimension and our date hierarchy looked like this.

DateCategory–>Year–>Month–>Date

Now, when the users dragged date hierarchy on rows in excel, they would see years before 2008 under Pre-2008 then all the years between 2008  and 2024 (which they were most interested in ) and then Post-2024. Nice and clean.

Hopefully these will be helpful to you in some way or might give you some better idea of handling large date dimension. If you have any suggestions, please feel free to drop me a line.

Hide a specific attribute hierarchy in role playing dimension – SSAS multidimensional

Role playing dimension is a dimension which plays different roles in a fact table. In SSAS multidimensional, a role playing dimension can be used across different cubes and measure groups. Date dimension is the most common example. A date might be appear in various forms such as SaleDate, ReceivedDate, OrderDate, ShipDate etc. Creating a separate dimension for each of these would not be a good approach. Instead, a single date dimension is created and is attached to various measure groups using different keys. In fact, if you have defined a relationship between multiple fact table columns and same dimension key, SSAS multidimensional will add the role playing dimensions automatically as shown below.

AdventureWorksRolePlayingDimension

AdventureWorksRolePlayingDimension

In certain situations, however, not all the attributes of a role playing dimension are relevant to all measure groups or cubes. Let us take an example of Organisation dimension. This dimension has four attribute hierarchies namely OrgnisationKey, Organisation Name, Customer Organisation Reference, and Provider Organisation Reference. Let us also assume a fictional fact table with columns CustomerOrgnisationKey , ProviderOrganisationKey and TransactionAmount. This fact table references Organisation dimension twice i.e. CustomerOrgnisationKey –>  OrgnisationKey and ProviderOrganisationKey –> OrgnisationKey. This means Organisation dimension is a role playing dimension. The table relationship and role playing in SSAS is shown below.

Tabler Relationships and Role Playing

Tabler Relationships and Role Playing

Now, when browsing the Customer Organisation dimension, the hierarchy Provider Organisation Reference doesn’t make much sense. At the same time, when browsing Provider Organisation dimension Provider Organisation Reference doesn’t convey any meaning. There is no apparent way of hiding the non-relevant hierarchy in each dimension. If you hide a hierarchy in the parent Organisation dimension, it would hide it in both the role playing dimensions. This is where I was stuck. As mentioned in one of posts, there are two places where ‘AttributeHierarchyVisible’ property can be set. For role playing dimension, you can do that for each individual role playing dimension from Cube structure tab. So go to Cube Structure tab and you should see Dimensions panel below Measures panel. This will show each dimension and it’s attributes separately. Now you can hide or even enable and disable attribute hierarchies for each role playing dimension. A picture is worth a thousand words, so here’s a screenshot.

Hiding Attribute Hierarchy in Role Playing Dimension

Hiding Attribute Hierarchy in Role Playing Dimension

Changing database ID in Analysis Services

Every object in SSAS database, such as a Cube, a Measure Group, a Measure and a Dimension, has an ID which is generated automatically when the object is created. The ID, once generated, cannot be changed from the Properties window of the object; it is read-only. When an object is created,  by default, object name and it’s ID are same, however, over a period, objects get renamed leaving the ID with old value. In majority of situations, this does not create a problem but there are a few cases when this can be problematic. Now some people might say I have an OCD (Obsessive Compulsive Disorder) but I like my object names and IDs to be consistent.For most of the objects, the ID can be changed by opening the object code (right click–> View Code) and changing the value in <ID> tag.  A word of warning though, be extremely careful when changing the IDs. It can create quite an issue in your project. For example if you change ID of dimension, you would have to make same change in cube XML else you would loose the relationship.

Anyway, that’s all find and dandy. But how would you change the database id? If you rename the database, the ID does not change and there is no View Code for project. As it turns out, it is in *.database file of the project. In Solution Explorer, click on Show All Files. You would see a your_project_name.database file. Right click on it and then View Code. You will find <ID> tag which contains the ID of the database.

Are you setting ‘Slice’ property in SSAS partitions?

Are you setting ‘Slice’ property in SSAS partitions? If not, you should. Here’s what happened with me.  We have a fairly large SSAS cube. Initially this cube was having only two partitions, current data and historic data. The performance was degrading and so we partitioned the cube by month since majority of the queries are by month and below. This also helped us in improving the processing performance because we can now process only those partitions for which data has changed, two or three at the maximum. On the downside though, there are large number of partitions in cube; a fact we decided we can live with.
So I created the partitions and got the cube processed. I wrote a simple MDX query to observe how cube responds in profiler. Here is the query (slightly changed for anonymity).

SELECT
[Measures].[Measure Dummy] ON COLUMNS
FROM [Playbox]
WHERE [Period].[YearMonthDaySettlementPeriod].[Calendar Year].&[2010].&[4]

And here’s what I was seeing in profiler.

Without Slice all partitions are scanned

Without Slice, all the partitions are scanned

Hang on! I am querying only April 2010 data, why are those scans on all the other partitions? This defeated the whole purpose of partitioning by month because queries were scanning all the partitions. As mentioned earlier, we had a number of partitions and this is bound to have an impact.Even after creating partitions I was still not getting the benefit I wanted.

Now on the crux of the matter. As it turns out (after spending hours on google, msdn), automatic detection of Slice of SSAS 2008 is not so effective. Hence even though the partitions are clearly for a month, SSAS still has to scan all of them to get data. There are a number of blogs which elaborate the issue. I will just point to this, this and this.

So I set the slice property on each partition, run the same query and guess what, it is hitting only the required partition.

With Slice defined, only required partition is scanned

With Slice defined, only required partition is scanned

The performance benefit was quite apparent as well. I selected a very common real life query. I noted the query timing using the AS Performance Workbench which I have mentioned in my previous blog.  As you can see in the below images, the response time improved dramatically.

Query Duration without Slice

Query Duration without Slice

Query duration with Slice

Query duration with Slice

So the bottom line is “Always define slice when you create SSAS partition”. This has to go in my list of things to look for when you are building SSAS cube.