Performance Testing

Knight’s Move Puzzle in TSQL

A colleague at work set us an interesting puzzle. The puzzle is a generic programming puzzle but we were required to solve it with TSQL. The person, who can come up with fastest solution, wins. Here’s the puzzle in question.

Puzzle


Imagine a keypad like below.

Alt

We want to find all of the unique 10 key sequences that can be entered into the keypad.

  • The initial key press can be any of the keys.
  • Each subsequent key press must be a knight move from the previous key press.
  • A key can be used any number of times.
  • There can be at most two vowels in each sequence.

A valid knight move is either

  • One step vertically and two steps horizontally
  • One step horizontally and two steps vertically

Examples below of valid moves, (from black to green)

Alt

The SQL Code must

  • Display the total number of possible sequences
  • Store all of the sequences in a Temp table named #Results
  • Not require any permission other than Public and connect on the database that is connected to the window we run the script in

The person who’s code runs fastest wins.

Full disclaimer: I did not win the challenge. A couple of co-workers developed really innovative solutions but I thought it was worth to put my approach in the blog so that I can comeback to it later.
I took- probably – a very simplistic approach and was fairly happy with the results. But that’s not the end of it. When I ran the same solution script on various SQL Server editions, the timings varied widely. They were getting better with newer versions of SQL Server.

Solution


The complete code from below is available on my  github account.

Convert to table

Let’s get the obvious out of the way – this was meant to be solved using TSQL so procedural approach was out of question. TSQL is, as we all know, excells at relational operations and we will get fastest solution if we exploit that. Next up, we want to use the other good part about RDBMS’ in general – the ability to index and do fast lookups.

With that in mind,lets model the keypad as a table. Each alphabet is unique – so that’s our primary key – and each has a set of properties: row location, column location and if it’s vowel or not. This vowel column comes handy later as you will see. The keypad table looks like below.

Alt

Knight’s moves as SQL query

When a knight moves, it arrives at a new location on keypad. This new location has a row number and column number and they can be derived from row and column number of original location and the directions of the move. There are essentially eight cases (or moves) from each character.

One step vertically and two steps horizontally
(1) One step up and two steps on left
(2) One step up and two steps on right
(3) One step down and two steps on left
(4) One step down and two steps on right

One step horizontally and two steps vertically
(5) One step left and two steps up
(6) One step left and two steps down
(7) One step right and two steps up
(8) One step right and two steps up

For each case above, we can join the keypad table to itself on row and column number to find all valid combinations. Lets take case 1 above and apply it to position of H. H is at row 2 and column 3. Lets denote it as H(2,3). One step above H is C and then two steps to the left is A which is A(1,1). So if we reduce row number by 1 and column number by 2 from H, we would arrive at A using a first valid knight’s move. That, essentially is our self join condition. In the same fashion,for case 2 above, to go from H(2,3)to E(1,5) we would reduce a row number but increase the column number by 2 since we are moving to the right. Here’s the sample query for first case

SELECT d1.val,d2.val AS c1_v1up_h2left,CASE
WHEN d2.val IN ( 'A', 'E', 'I', 'O' )
THEN 1
ELSE 0
END isnextvowel
FROM #keypad d1
INNER JOIN #keypad d2
ON d1.rownum - 1 = d2.rownum
AND d2.colnum = d1.colnum - 2

We can apply the same reasoning to all eight cases and that give us eight different queries. The union of all eight queries will be our set of all combinations of valid moves from any character from keypad. Also note that the inner join will automatically eliminate positions which does not exists. For e.g. for 1(4,2) there is no case 3 from above since there isn’t any row below one. Have we used left join, we would have received a record for this case but next val would have been empty. Below are the valid moves from 1.

Alt

Ten sequence chain

Next, we want to find a sequence of ten moves which contain less than two vowels. This is where the vowel column from the keypad table comes handy. But first lets look at how we find the sequence.
We have table at hand which contains every possible valid move from every character in the keypad. Let’s assume that we want to find only three character sequence. We start with our keypad table and then join to valid combination table on current value. So next value column from the valid combination table will be our second key from sequence. If we now join the valid combinations table to results from above on we get the third value in sequence. The join condition would be on next valid character from first result to original value column from second. We can repeat the process to get 10 key sequence.

Here’s the sample query which finds the first three characters

SELECT d.val s1,r1.nextval s2,r2.nextval s3
FROM #keypad d
INNER JOIN #validcombination r1
ON d.val = r1.val
INNER JOIN #validcombination r2
ON r1.nextval = r2.val

Removing vowels

In the ten joins query like above, each instance of the table has vowel column available. If we sum the values of this column in the result set, that tells us how many vowels are there in the sequence. So to filter them out, we simply put in a where clause which checks that this sum is less than two. Using mathematical operations is better than string comparison so SUM is more efficient than checking if more than two characters are vowels. Here are few records from results; you can verify yourself.

Alt

Timings

This is where it gets interesting. I ran the same script on four different versions of SQL Server: SQL Server 2008 R2 SP3, SQL Server 2012 SP1, SQL Server 2014 and SQL Server 2016 CTP 3.2. Admittedly, they ran on different hardware and SQL Server 2014 and SQL Server 2016 CTP was running in a virtual machine but hardware was similar; each had a four processor cores, 8 GB memory and SSD. In case of VMs the configuration exactly same and VM itself was hosted on SSD. I did expect the timings to change but the change was pretty stark. The results are given below. The first run was on a cold cache and subsequent ones are on warm cache. The average column at the end is the average of all.

SQL Version Run 1 Run 2 Run 3 Run 4 Run 5 Average
SQL Server 2008R2 1316 1310 1313 1333 1303 1315
SQL Server 2012 1026 1066 1106 1120 1056 1074.8
SQL Server 2014 403 980 396 373 393 509
SQL Server 2016 470 453 400 390 370 416.6

My theory on improvements in execution timings is that a. The way temporary tables are used in TSQL has been improved in each version of SQL Server and b. The SQL Server query optimizer has been improving with each version. The second point is more relevant since we know that optimizer is improved which each version of SQL Server. I did not check the query plans for each version but may be the answer lies there.

Further improvements to the solution

The code can certainly be improved, although, I believe that unless the approach is radically altered, the gains in improvements will be marginal. It will follow the law of diminishing marginal performance improvements (I completely made that up) whereby the amount of effort to improve performance will grow exponentially as compared to marginal performance benefit we get or, in other words, we will have to put a significant amount of efforts to improve the performance by a littl e. Having said that

  • Putting normal clustered and non clustered index can have an impact
  • How would actual table, instead of temporary tables, perform?
  • In SQL Server 2012 and onward, we can use columnstore index, would they have any impact? Which one would be better, non clustered columnstore or clustered?
  • What if we create the temporary table as in-memory tables in SQL Server 2014 and 2016, would they improve performance?
  • Any combination of the above?
Advertisements

Faster SSAS Processing by using Shared Memory Protocol

I came across this very handy tip to increase SSAS processing speed in SQL Server 2008 R2 Analysis Services Operations Guide and thought it is worth sharing.
The guide recommendes using Shared Memory Protocol when getting data from the relational data source if it is SQL Server and both SSAS and SQL Server are on the same physical box. Exchaning data over shared memory is much faster than over TCP/IP as you probably already know. You will need to perform two steps to force SSAS data source to use shared memory while querying underlying SQL Server.
1. Check that Shared Memory Protocol is enabled in SQL Server configuration manager.
2. Prefix the data source connection string in SSAS with :lpc like below.

Provider=SQLNCLI10.1;Data Source=lpc:ThisServer\INST1;Integrated Security=SSPI;Initial Catalog=WordMatch

The guide claims to have achieved 112,000 rows per second using TCP/IP where as 180,000 rows per second using shared memory which is impressive. In my own test, a slightly modified Adventure Works cube took 56 seconds to process using TCP-IP whereas 47 seconds using shared memory; an improvement of 16%.

Performance testing analysis services cubes

Recently I was asked to do some performance testing and benchmarking for a SQL Server Analysis Services 2008 (SSAS 2008) cube. As it happens in most of the projects, I did not have much time and I needed to find a quick way to do this. In this post, I would like to jot down the approach I took in the hope that one day it might be useful to somebody. So here it goes.

Some background..

The cube in the question was of an average size. About 20GB in size with 10 dimensions and 5 measure groups. At any time, maximum 15 people would use it exclusively through Excel 2007. This gave me a good starting point. I wanted to test the performance of the cube for 15 people with queries which typically are like the one generated in Excel.  I also came to conclusion that in each query the response time of around 300ms is acceptable (Of course this is debatable but I leave it up to you).

Collecting the data..

Given that all the queries will be from Excel, I had two options to collect test queries. Using OLAP PivotTable Extension which is also mentioned on my Tools and Tips page or using SQL Server profiler.

Collecting queries from OLAP PivotTable extension is very straight forward. You build your pivot table and copy query from the extension’s dialogue box. However, there are two main drawback I found with this approach. One, it is one-query-at-a-time approach. Two, the pivot table would be built by me and not the actual users. I don’t necessarily know what dimension and measure groups they are most interested in and so the queries may not necessarily represent actual workload. I also did not have the option of asking each end-user to create a pivot table and send me the queries. I though that’s too much to ask.

Using SQL Server profiler solves both the issues. Since end-users are continuously using the cube, I get variety of real life queries and I can save them in batch. As it turns out there is a way to export exact MDX queries from SQL server profiler. So I ran SQL Server profiler on the server (yes, I know, you should not run profiler on live server but in this case I took the risk!) at peak time while maximum users were using the cube and exported the MDX queries. When profiler exports the queries, it exports it in a batch i.e. all the queries are in one file. Here, I resorted to manually separating each query in a separate file.

The toolset..

For testing/benchmarking exercise, I used AS Performance Workbench. It is the simplest and most intuitive tool I could find around. I am pretty sure there are other utilities and methods available but this one was just what I was looking for. It is simple to use, the report generated is pretty and easy to understand, it can simulate load and ,most importantly, it is free.

The method..

I decided to create 25 query files. They covered all the dimensions and measures. I selected more complex queries on frequently used measures. I also decided to do three runs with 10,12 and 15 users respectively. This varied the number of users nicely. I did the first run on cold cache i.e. I emptied the cache before running the first test while for second and third run I let the cache as it is. Also I made sure that nobody else is using the server in any way while I was testing. This was done to get a more realistic picture about effect of queries on CPU and IO. If any other services are running on the server, then they will be using CPU and disk, which will skew the results.

The results..

There were two aspects to this exercise – benchmarking and testing.

Benchmarking part was easy. After each run I generate the report and saved it. I also saved all the queries along with details of the hardware on which testing was carried out and a brief report. The idea is that after while when the data increases, we would run the same set of test again to see performance change.

As for testing I was most interested in three counters

a. % Processor Time

b. Avg. duration for each query

c. Current Disk Queue Length

A small table for each run like below sufficed.

Counter Expected Result Actual Result Pass/Fail
% Processor Time  Below 60%  45.96 %  Pass
Current Disk Queue Length  Less than 1  0  Pass
Avg. Query 1 duration  300 ms 175 ms  Pass
Avg. Query 1 duration  300 ms  390 ms  Fail

Hope that this post will be useful to somebody or may be in future!!