Click here to Skip to main content
15,867,984 members
Articles / Programming Languages / SQL

Median Calculation using TSQL (No Cursors)

Rate me:
Please Sign up or sign in to vote.
4.60/5 (3 votes)
22 Oct 2010CPOL2 min read 31.9K   6   3
Median Calculation using TSQL (no cursors)

Before we start, let's define what a Median is first, in layman's terms, the Median is the “middle number” in a sorted list of numbers.

So in a sample like:

3, 4, 5, 8, 10, 11, 100

The middle number which is the median as well is number 8 as you have 3 numbers before and after number 8, now what if there are two numbers in the middle and that will happen if there is an even amount of numbers, in that case we need to find the middle two numbers, then get their average or add them (the 2 middle numbers) together and divide them by 2 so on the example earlier let's try to add another number:

3, 4, 5, 8, 10, 11, 100, 101

So here the middle number is 8 and 10 and their average is 9, so that's the median number.

Now you know what median is, now we try to apply that in SQL, there are a lot of solutions but mostly, it uses cursors or complex TSQL queries as it's not directly supported by TSQL, and here are the only grouping functions available. Isn’t it good to add the median function in the next versions of SQL:

Thanks to CTE or common table expressions, things got easier.

So let's get started. Let's say you have this table and you want to calculate the median price for each product:

Now to view it easily, you can sort it by Price and get the middle numbers:

Now to achieve that on TSQL, here is the code:

SQL
With
MedianResults
as
(
Select
Name,Price ,   
Row_Number() OVER(Partition by Name Order by Price) as A,
Row_Number() OVER(Partition by Name Order by Price desc) as B
from ProductStats
)
Select Name, Avg(Price) as Median
From MedianResults
Where Abs(A-B)<=1
Group by Name

Doing that, the results will yield:

UPDATE!!!

One of the article readers DEK46656 found a bug on the above post and it happens if you have values that are “tied”, the ROW_NUMBER result is not “aligned” as you might want, and it produces an incorrect value. As he suggested I used a set with tied numbers such as

3, 4, 5, 8, 101, 101, 101, 101

and in that set it should output 54.5 but it did not and instead it's outputting 8 instead as the ROW_NUMBERS becomes unaligned when you use similar value digits. So his solution was this:

SQL
WITH    
ResultA 
AS 
(SELECT    Name, Price, ROW_NUMBER() _
 OVER ( PARTITION BY Name ORDER BY Price ) AS A FROM  ProductStats), 
ResultB
AS 
(SELECT    Name, Price, A, ROW_NUMBER() _
OVER ( PARTITION BY Name ORDER BY Price DESC, A DESC ) AS B FROM ResultA)
SELECT  
Name, 
Avg(Price) as Median
FROM ResultB
WHERE  Abs(A - B) <= 1
GROUP BY Name

and I totally agree! Thanks DEK46656.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Technical Lead
New Zealand New Zealand
http://nz.linkedin.com/in/macaalay
http://macaalay.com/

Comments and Discussions

 
GeneralGood one. Pin
The Manoj Kumar21-Oct-10 22:05
The Manoj Kumar21-Oct-10 22:05 
GeneralFound a bug dealing with tied values Pin
DEK4665620-Oct-10 11:36
DEK4665620-Oct-10 11:36 
Great approach, but I believe I found a bug. Playing with this and comparing results with Excel, I discovered that if you have values that are “tied”, the ROW_NUMBER result is not “aligned” as you might want, and it produces an incorrect value. Try the numbers 3, 4, 5, 8, 101, 101, 101, 101. You should get 54.5, but when run using your code, it produces 8. I’m testing this out on SQL Server 2008, but I don’t believe that matters in this case.

The issue is dealing with ties in the ROW_NUMBER, and ordering ties between A and B. My fix was the following change to the CTE:

WITH    
RsltA AS ( 
SELECT	Name, Price, 
		ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Price ) AS A
FROM     @MyTableVar
		), 
RsltB AS ( 
SELECT	Name, Price, A, 
		ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Price DESC, A DESC ) AS B
FROM     RsltA
		)

SELECT  Name, 
		Avg(Price) as Median
FROM    RsltB
WHERE   Abs(A - B) <= 1
GROUP BY Name


Let me know if you agree.

DEK46656
GeneralRe: Found a bug dealing with tied values Pin
Raymund Macaalay20-Oct-10 13:07
Raymund Macaalay20-Oct-10 13:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.