Analysis Services

Faster SSAS Processing by using Shared Memory Protocol

I came across this very handy tip to increase SSAS processing speed in SQL Server 2008 R2 Analysis Services Operations Guide and thought it is worth sharing.
The guide recommendes using Shared Memory Protocol when getting data from the relational data source if it is SQL Server and both SSAS and SQL Server are on the same physical box. Exchaning data over shared memory is much faster than over TCP/IP as you probably already know. You will need to perform two steps to force SSAS data source to use shared memory while querying underlying SQL Server.
1. Check that Shared Memory Protocol is enabled in SQL Server configuration manager.
2. Prefix the data source connection string in SSAS with :lpc like below.

Provider=SQLNCLI10.1;Data Source=lpc:ThisServer\INST1;Integrated Security=SSPI;Initial Catalog=WordMatch

The guide claims to have achieved 112,000 rows per second using TCP/IP where as 180,000 rows per second using shared memory which is impressive. In my own test, a slightly modified Adventure Works cube took 56 seconds to process using TCP-IP whereas 47 seconds using shared memory; an improvement of 16%.

Advertisements

Calculated distinct count measures in SSAS

Distinct count measures are fact-of-life for an SSAS developer. No matter how much we try to avoid them they are always asked for and we have to deal with them somehow. Another painful fact is, as you might already know, we cannot create multiple distinct count measures in the same measure group. So each distinct count measure has to sit in its own measure group which,IMO, does not look right when browsing the cube. In this post, I want to show a method of creating distinct count calculated measures which I found on Richard Lees blog here with slight modification.
http://richardlees.blogspot.co.uk/2008/10/alternative-to-physical-distinct-count.html

Using Adventure Works, let’s say the end users want to know the distinct count of customers who have placed orders on the internet. I can add a calculation like this in the cube

CREATE MEMBER CURRENTCUBE.[Measures].[Unique Customers]
AS COUNT(NONEMPTY([Customer].[Customer].[Customer].members,
[Measures].[Internet Order Count])),
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Internet Orders' ;

This is all fine and dandy, however, as soon as I added any attribute from customer dimension on the rows or filters, the results were showing incorrect values i.e. the same count of customers was repeated for all records.

The solution is to count the number of unique customers in the context of current attributes of customer dimension. For examples sake, lets take Customer City attribute. I tweaked the calculation like below to count customers only in the context of current members in City attributes and it started working as expected when Customer City attribute is used in rows, columns or filters.

CREATE MEMBER CURRENTCUBE.[Measures].[Unique Customers]
AS COUNT(NONEMPTY(
CROSSJOIN(
[Customer].[Customer].[Customer].members,
[Customer].[City].CurrentMember
),[Measures].[Internet Order Count])),
VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Internet Orders' ;

Of course, you will have to add all the dimension attributes in the CROSSJOIN but ultimately a calculated, though complex, distinct count measure is better than having a number of physical distinct count measures IMHO.

 

Building data model for SSAS Tabular

I have started preparing for Microsoft Certifications on SQL Server 2012 and I am spending a fair amount of time with SSAS Tabular since it’s the newest addition to MS BI stack and the one which I am not very accoustomed to. As I work with it more and more, I am starting to think about how would you go around designing the database on which the tabular model will be based. In the multi-dimensional world, this was an easy answer. You would almost always design the DW in Star/Snow-flake schema or atleast expose the DW as Star/Snow-flake schema through SQL Server Views or Data Source View (DSV). In SSAS Tabular, it appears to me, that this not really necessary.

Do I really need to segregate data in facts and dimension?
I can create measure out of any column in any table in SSAS Tabular. It doesn’t have to be from a fact table so why separate out facts and dimensions? Assume that I have a ‘House’ dimension with ‘NumberOfRooms’ as attribute. In SSAS Multidimensional, if the users wanted to slice and dice ‘NumberOfRooms’ by other attributes of house dimension or any other dimension, you would have to some how surface this as a fact table. SSAS Tabular will happily allow you to create measure out of this column although it’s in a dimension table. In theory, you can point the tabular model to the normalized OLTP database and still be able to give users the ability to slice and dice the data just like SSAS Multidimensional.

Normalized Or Denormalized, that is the question..
There would be obvious issues with building the tabular model on a normalized schema.One, since the database is normalized, the sheer number of tables will be very large. Two, maintaining relationships between the tables would be painful.Three, the model itself can be confusing for end users. On the other hand, building a fully denormalized schema when you don’t really need it adds complexity to the project, especially in the ETL. I don’t think I need to reiterate that the complex part of DW project is ETL and around 60-70% effort are spent on this phase. So it appears like we need some middle ground.

Enter Entity Based design..
I have always perceived the DW as a collection of entities like product, sales, employee etc.  which have relationship with each other. The star schema is similar but the dimensions don’t have relationship with each other directly. They are related to each other through the fact table. Plus, the dimensions are denormalized so as much as possible information is crammed into them. I guess what I am trying to say is we don’t need a fully denormalized structure and at the same time we don’t want the datawarehouse in 3NF. We would need to find a middle ground so that it is just about corretly denormalized and normalized at the same time. In my opinion, the way to get to this is to think of DW in terms of entities and how they interact with each other rather than classifying the data into dimension and facts.

Of course, these are all my prilimanary thoughts and are very much open to issues. I guess we would have to wait for wide spread adoption of the tabular model and learn from people who did it.

AttributeHierarchyVisible in SSAS and a small observation

 

In Analysis Services 2008 to hide an attribute hierarchy, you set the AttributeHierarchyVisible property to False. The hierarchy will still be available but will be hidden from the user. We can set the flag to True if we want to show it again except there is a small catch which you need to be aware of. There are two places where we can set AttributeHierarchyVisible.

First place is our normal dimension properties window and the other is in the cube designer window. Open a cube in BIDS. If you haven’t changed the default view in cube design window, you should have Measures window on the right and Dimension window just below it.  Expand any of the dimensions in the Dimensions window, right click on an attribute and select properties. You will find another AttributeHierarchyVisible property which you can set.

It seems like the two act like an OR operation. Both need to be True for attribute to be visible. If either of them is False, the attribute will not be visible. In my case the property from cube designer window got changed to False (I don’t know how?) and then no matter what I did the attribute was not visible.

If you are confused like me as to why we need to change the same property twice, I am sorry I don’t have the answer. But at least we know the solution.

Regards