A colleague at work set us an interesting puzzle. The puzzle is a generic programming puzzle but we were required to solve it with TSQL. The person, who can come up with fastest solution, wins. Here’s the puzzle in question.
Imagine a keypad like below.
We want to find all of the unique 10 key sequences that can be entered into the keypad.
- The initial key press can be any of the keys.
- Each subsequent key press must be a knight move from the previous key press.
- A key can be used any number of times.
- There can be at most two vowels in each sequence.
A valid knight move is either
- One step vertically and two steps horizontally
- One step horizontally and two steps vertically
Examples below of valid moves, (from black to green)
The SQL Code must
- Display the total number of possible sequences
- Store all of the sequences in a Temp table named #Results
- Not require any permission other than Public and connect on the database that is connected to the window we run the script in
The person who’s code runs fastest wins.
Full disclaimer: I did not win the challenge. A couple of co-workers developed really innovative solutions but I thought it was worth to put my approach in the blog so that I can comeback to it later.
I took- probably – a very simplistic approach and was fairly happy with the results. But that’s not the end of it. When I ran the same solution script on various SQL Server editions, the timings varied widely. They were getting better with newer versions of SQL Server.
The complete code from below is available on my github account.
Convert to table
Let’s get the obvious out of the way – this was meant to be solved using TSQL so procedural approach was out of question. TSQL is, as we all know, excells at relational operations and we will get fastest solution if we exploit that. Next up, we want to use the other good part about RDBMS’ in general – the ability to index and do fast lookups.
With that in mind,lets model the keypad as a table. Each alphabet is unique – so that’s our primary key – and each has a set of properties: row location, column location and if it’s vowel or not. This vowel column comes handy later as you will see. The keypad table looks like below.
Knight’s moves as SQL query
When a knight moves, it arrives at a new location on keypad. This new location has a row number and column number and they can be derived from row and column number of original location and the directions of the move. There are essentially eight cases (or moves) from each character.
One step vertically and two steps horizontally
(1) One step up and two steps on left
(2) One step up and two steps on right
(3) One step down and two steps on left
(4) One step down and two steps on right
One step horizontally and two steps vertically
(5) One step left and two steps up
(6) One step left and two steps down
(7) One step right and two steps up
(8) One step right and two steps up
For each case above, we can join the keypad table to itself on row and column number to find all valid combinations. Lets take case 1 above and apply it to position of H. H is at row 2 and column 3. Lets denote it as H(2,3). One step above H is C and then two steps to the left is A which is A(1,1). So if we reduce row number by 1 and column number by 2 from H, we would arrive at A using a first valid knight’s move. That, essentially is our self join condition. In the same fashion,for case 2 above, to go from H(2,3)to E(1,5) we would reduce a row number but increase the column number by 2 since we are moving to the right. Here’s the sample query for first case
SELECT d1.val,d2.val AS c1_v1up_h2left,CASE
WHEN d2.val IN ( 'A', 'E', 'I', 'O' )
FROM #keypad d1
INNER JOIN #keypad d2
ON d1.rownum - 1 = d2.rownum
AND d2.colnum = d1.colnum - 2
We can apply the same reasoning to all eight cases and that give us eight different queries. The union of all eight queries will be our set of all combinations of valid moves from any character from keypad. Also note that the inner join will automatically eliminate positions which does not exists. For e.g. for 1(4,2) there is no case 3 from above since there isn’t any row below one. Have we used left join, we would have received a record for this case but next val would have been empty. Below are the valid moves from 1.
Ten sequence chain
Next, we want to find a sequence of ten moves which contain less than two vowels. This is where the vowel column from the keypad table comes handy. But first lets look at how we find the sequence.
We have table at hand which contains every possible valid move from every character in the keypad. Let’s assume that we want to find only three character sequence. We start with our keypad table and then join to valid combination table on current value. So next value column from the valid combination table will be our second key from sequence. If we now join the valid combinations table to results from above on we get the third value in sequence. The join condition would be on next valid character from first result to original value column from second. We can repeat the process to get 10 key sequence.
Here’s the sample query which finds the first three characters
SELECT d.val s1,r1.nextval s2,r2.nextval s3
FROM #keypad d
INNER JOIN #validcombination r1
ON d.val = r1.val
INNER JOIN #validcombination r2
ON r1.nextval = r2.val
In the ten joins query like above, each instance of the table has vowel column available. If we sum the values of this column in the result set, that tells us how many vowels are there in the sequence. So to filter them out, we simply put in a where clause which checks that this sum is less than two. Using mathematical operations is better than string comparison so SUM is more efficient than checking if more than two characters are vowels. Here are few records from results; you can verify yourself.
This is where it gets interesting. I ran the same script on four different versions of SQL Server: SQL Server 2008 R2 SP3, SQL Server 2012 SP1, SQL Server 2014 and SQL Server 2016 CTP 3.2. Admittedly, they ran on different hardware and SQL Server 2014 and SQL Server 2016 CTP was running in a virtual machine but hardware was similar; each had a four processor cores, 8 GB memory and SSD. In case of VMs the configuration exactly same and VM itself was hosted on SSD. I did expect the timings to change but the change was pretty stark. The results are given below. The first run was on a cold cache and subsequent ones are on warm cache. The average column at the end is the average of all.
|SQL Server 2008R2
|SQL Server 2012
|SQL Server 2014
|SQL Server 2016
My theory on improvements in execution timings is that a. The way temporary tables are used in TSQL has been improved in each version of SQL Server and b. The SQL Server query optimizer has been improving with each version. The second point is more relevant since we know that optimizer is improved which each version of SQL Server. I did not check the query plans for each version but may be the answer lies there.
Further improvements to the solution
The code can certainly be improved, although, I believe that unless the approach is radically altered, the gains in improvements will be marginal. It will follow the law of diminishing marginal performance improvements (I completely made that up) whereby the amount of effort to improve performance will grow exponentially as compared to marginal performance benefit we get or, in other words, we will have to put a significant amount of efforts to improve the performance by a littl e. Having said that
- Putting normal clustered and non clustered index can have an impact
- How would actual table, instead of temporary tables, perform?
- In SQL Server 2012 and onward, we can use columnstore index, would they have any impact? Which one would be better, non clustered columnstore or clustered?
- What if we create the temporary table as in-memory tables in SQL Server 2014 and 2016, would they improve performance?
- Any combination of the above?