Calculated distinct count measures in SSAS

Distinct count measures are fact-of-life for an SSAS developer. No matter how much we try to avoid them they are always asked for and we have to deal with them somehow. Another painful fact is, as you might already know, we cannot create multiple distinct count measures in the same measure group. So each distinct count measure has to sit in its own measure group which,IMO, does not look right when browsing the cube. In this post, I want to show a method of creating distinct count calculated measures which I found on Richard Lees blog here with slight modification.
http://richardlees.blogspot.co.uk/2008/10/alternative-to-physical-distinct-count.html

Using Adventure Works, let’s say the end users want to know the distinct count of customers who have placed orders on the internet. I can add a calculation like this in the cube

CREATE MEMBER CURRENTCUBE.[Measures].[Unique Customers]
AS COUNT(NONEMPTY([Customer].[Customer].[Customer].members,
[Measures].[Internet Order Count])),
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Internet Orders' ;

This is all fine and dandy, however, as soon as I added any attribute from customer dimension on the rows or filters, the results were showing incorrect values i.e. the same count of customers was repeated for all records.

The solution is to count the number of unique customers in the context of current attributes of customer dimension. For examples sake, lets take Customer City attribute. I tweaked the calculation like below to count customers only in the context of current members in City attributes and it started working as expected when Customer City attribute is used in rows, columns or filters.

CREATE MEMBER CURRENTCUBE.[Measures].[Unique Customers]
AS COUNT(NONEMPTY(
CROSSJOIN(
[Customer].[Customer].[Customer].members,
[Customer].[City].CurrentMember
),[Measures].[Internet Order Count])),
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Internet Orders' ;

Of course, you will have to add all the dimension attributes in the CROSSJOIN but ultimately a calculated, though complex, distinct count measure is better than having a number of physical distinct count measures IMHO.

 

Advertisements

3 comments

  1. Just to add a few more options: In many cases, using Count(existing [Customer].[Customer].[Customer]) would do the trick (distinct count of customers). This would remove the need for the crossjoin operations. However if it comes to sub selects (for example when using Excel with more than one element on a filter for an attribute), currentmember and existing will not reflect the context of the subcube. In this case, you could use a dynamic set. I do have some examples about using existing and dynamic sets in an older blog post of mine: http://ms-olap.blogspot.de/2010/02/solution-von-ssas-2008-multi-selects-in.html. Both options (existing and existing with dynamic set) have the adventage, that you don’t need to refer to currentmember in your calculation.

  2. Hilmar, great post. I ran across it in doing research for a dilemma I have. I have a cube that takes data from our client’s demographic data table. This table stored millions of records. I built a cube to help simplify data analysis for our quality department. I am versed in programming .Net and also SQL, but I am not the DBA. Here is my dilemma. When they run a report for the current fiscal year end, the cube properly counts distinct client records. But when they add in a dimension such as Population Classification (CHILD, ADULT, SENIOR), I get double counts of records for those clients whose classification has changed during the year (like going from child to adult-it counts the client once, but counts child and adult once for that client). I know this is a slowly changing dimension, but I cannot change the actual data warehouse tables or create something in SSIS for the SCD. So I was looking at using a calculated measure. In your opinion, am I doing this correctly or would you be doing something else? By the way, the quality department did determine that they do not care about this historical count, but the count at the end of the year. So in the case listed above, it should count that client once as an adult and ignore the count for child.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s