Sieve of Eratosthenes in R and SQL

I just completed Computing for Data Analysis on Coursera. The course is brief introduction to R programming language. R has been around for years but is gaining much attention recently due to Big Data and Data Science trends. I had an idea about R and the course offered a wonderful opportunity to learn in a systematic manner. So for some more practice and a bit of fun (ok, I admit, more for fun than practice), I decided to implement ‘Sieve of Eratosthenes’ in R and SQL and see which one is faster (because that’s what you do on a lazy Saturday!!) This is a method to find primes up to a given number. The R code looks like this.


getPrimeNumTilln <- function(n) {

# a holds a sequence of numbers from 2 to n
a <- c(2:n)
# we start from 2 since it is the beginning of prime numbers,
# it is also the loop varibale
l <- 2
# r this vector holds the results
r <- c()
#while the square of loop variable is less than n
while (l*l < n) {
# the prime number is the first element in vector a
# for e.g. in first iteration it will be 2
r <- c(r,a[1])

# remove elements from a which are multiples of l
# for e.g. in first iteration it will remove 2,4,6,8…
a <- a[-(which(a %% l ==0))]

# the loop varibale if the first variable in remaining a
# for e.g after first iteration, it will be 3, then 5 (since 4 has been removed)…
l <- a[1]
}
# the result is r and all the remaining elements in a
c(r,a)
}

And the SQL code looks like this.

DECLARE @maxnum INT = 1000 /* The number under which you want to find primes*/

DECLARE @l INT = 2 /* Beginning of prime numbers */

DECLARE @table TABLE (a INT NOT NULL PRIMARY KEY) /* Holding table*/

;WITH ct /* Generate the sequence of numbers*/
AS (
SELECT 2 AS id

UNION ALL

SELECT id + 1
FROM ct
WHERE id < @maxnum
)

INSERT INTO @table
SELECT id
FROM ct
OPTION (MAXRECURSION 0)

WHILE (@l * @l < @maxnum)
BEGIN
/*remove records which are divisible by l*/
DELETE
FROM @table
WHERE a != @l
AND (a % @l) = 0

/* the first remaining number is the prime number */
SELECT @l = MIN(a)
FROM @table
WHERE a > @l
END

SELECT COUNT(*)
FROM @table
Now, I am no expert in either maths or algorithms but that looks neat. To validate that the results are right, I ran it to check how many prime numbers are under 1000000 and both returned 78948. Wolfram Alpha seems to agree.

For smaller up to  10k, the results are comparable; they are in milliseconds. Above that,however,R seems to have an upper hand.

n R SQL
100000 0.02 1.72
1000000 0.56 19.00
10000000 11.25 246.21

Please note that the time is in seconds. The difference is quite stark especially for large n. R is killing SQL.

A few observations on SQL side are

  1. A significant time is being spent on generating sequence. With SQL Server 2012 Sequences, we might be able improve time.
  2. The delete operation is quite slow as we all know. I tried replacing it with update but that made it worse.

I know that there are many improvements that can be made to this but I am happy with my little testing. As usual comments are always welcome.

Advertisements

One comment

  1. Hi Swanand,
    As you say, the heaviest weight tied to your SQL performance is the sequence generating part. Try this instead and maybe then re-compare your SQL and R solutions.
    WITH
    L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
    L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
    L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
    SELECT TOP (@maxnum) n FROM Nums ORDER BY n;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s