Click here to Skip to main content
15,881,810 members
Articles / Web Development / ASP.NET
Tip/Trick

Improving SQL query performance using JOIN

Rate me:
Please Sign up or sign in to vote.
4.57/5 (5 votes)
29 Nov 2015CPOL2 min read 12.1K   4   6
Faced slow performance in certain stored procedure, which caused timeout error. This was improved using WITH clause and JOINs

Introduction

We faced a performance issue with one of our web service methods and the root cause of the issue was traced to a poor performing stored procedure. I have described where the performance bottleneck was and how we removed it.

Description

We had a call to a webservice method from biztalk and it was taking a long time, around 2 hrs for processing around 6k records and caused a timeout issue. To investigate it, we made a replica of the production environment in a test machine. While debugging, we found that it took an unusually long time to execute a stored procedure. 

So we went and examined it and found it to take a very long time. It has used cursors and was taking around 2 hours to process a file of 11MB size. The slow performing queries and logic are given below.

Original Procedure

SQL
DECLARE @CommData TABLE (ID INT, CommodityIndicatorID INT, _
	CommodityID INT, ParentCommodityID INT, Date DATETIME) 
INSERT @CommData SELECT ID, CommodityIndicatorID, CommodityID, _
	ParentCommodityID, Date FROM CommodityRawData _
	WHERE CommodityIndicatorID = @CRD_IndicatorID AND IsSelected = 0 AND IsTreated = 0

DECLARE @ParentCommID INT
DECLARE @CommID INT
DECLARE @RawDate DATETIME
DECLARE @RawDataID INT
DECLARE @TempData TABLE (ID INT)
DECLARE RawData_Cursor CURSOR FOR SELECT ID FROM @CommData 
OPEN RawData_Cursor
-- Perform the first fetch.
FETCH NEXT FROM RawData_Cursor into @RawDataID
-- Check @@FETCH_STATUS to see if there are any more rows to fetch.
WHILE @@FETCH_STATUS = 0
BEGIN

SET @ParentCommID = (Select ParentCommodityID FROM @CommData where ID = @RawDataID)
SET @CommID = (Select CommodityID FROM @CommData where ID = @RawDataID)
SET @RawDate = (Select Date FROM @CommData where ID = @RawDataID)

IF(@ParentCommID = 0 OR @ParentCommID IS NULL)
	INSERT @TempData SELECT ID FROM CommodityRawData _
	WHERE CommodityIndicatorID = @CRD_IndicatorID _
	AND (CommodityID = @CommID OR ParentCommodityID = @CommID) 
		AND Convert(varchar(20), @RawDate, 101) = Convert(varchar(20), _
		CommodityRawData.Date, 101)
ELSE
	INSERT @TempData SELECT ID FROM CommodityRawData _
	WHERE CommodityIndicatorID = @CRD_IndicatorID _
	AND (CommodityID = @ParentCommID OR ParentCommodityID = @ParentCommID) 
		AND Convert(varchar(20), @RawDate, 101) = Convert(varchar(20), _
		CommodityRawData.Date, 101)

-- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM RawData_Cursor into @RawDataID

END
CLOSE RawData_Cursor
DEALLOCATE RawData_Cursor

UPDATE CommodityRawData SET IsSelected = 1, _
	IsTreated = 0 WHERE ID IN (SELECT ID FROM @TempData)

SELECT * FROM CommodityRawData WHERE ID IN (SELECT DISTINCT ID FROM @TempData)

As seen, there is a row by row processing using a certain business logic and applying certain flags for selected records. Since it uses cursors, it was very much slow. We wondered if there was any other way in which the same logic can be applied. Then, came the WITH clause and JOIN. Rather than identifying each and every row, we used joins to do batch querying and storing the list of ids in a temporary table. Once we have all the ids in a temp table, we joined it with the main table to update the flags in the main table. This turned out to be much faster than the cursor based approach and we didn't get any timeout expiry after that. Hopefully, this knowledge will help someone in a similar scenario.

Changed Procedure

SQL
Changed Queries:

DECLARE @TempData TABLE (ID INT);

WITH CommData(ID, CommodityIndicatorID, CommodityID, ParentCommodityID, Date) AS 
(
SELECT ID, CommodityIndicatorID, CommodityID, ParentCommodityID, Date FROM CommodityRawData 
WHERE CommodityIndicatorID = @CRD_IndicatorID AND IsSelected = 0 AND IsTreated = 0
)
INSERT @TempData  
SELECT c.ID FROM CommodityRawData c, CommData
WHERE c.CommodityIndicatorID=@CRD_IndicatorID and 
(c.CommodityID = CommData.ParentCommodityID OR c.ParentCommodityID = CommData.ParentCommodityID)
and Convert(varchar(20), CommData.Date, 101) = Convert(varchar(20), c.Date, 101)
and (CommData.ParentCommodityID is not null and CommData.ParentCommodityID <> 0)
UNION
SELECT c.ID FROM CommodityRawData c, CommData
WHERE c.CommodityIndicatorID=@CRD_IndicatorID and 
(c.CommodityID = CommData.CommodityID OR c.ParentCommodityID = CommData.CommodityID)
and Convert(varchar(20), CommData.Date, 101) = Convert(varchar(20), c.Date, 101)
and (CommData.ParentCommodityID is  null or CommData.ParentCommodityID = 0)

UPDATE CommodityRawData SET IsSelected = 1, IsTreated = 0 
FROM CommodityRawData c inner join @TempData t
ON c.ID = t.ID

SELECT c.* FROM CommodityRawData c inner join @TempData t ON c.ID = t.ID

Test Result

The original query with CURSOR took 2 hrs to process a set of rows in a 11 MB file. The changed query using WITH and JOIN processed the same data within 10 minutes. Hence, using JOIN gives much better performance than using CURSOR.  

Learning

Important learning is to use joins wherever possible instead of cursors. Cursors are recommended only when there is complete business logic which absolutely cannot be accomplished by joins. Also, we should ensure that the join based approach returns the exact same result set returned by cursors based approach.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect Aspire Systems India(Pvt) Ltd
India India
Technical expertise in working with microsoft technologies such as VC++, C#, ASP.NET, SQL Server, SharePoint, BizTalk. Currently working in middleware tools such as: Informatica Cloud, Mule Soft, Dell Boomi

Comments and Discussions

 
QuestionSome better performance with JOINs Pin
Christophe Van Olmen7-Dec-15 22:37
professionalChristophe Van Olmen7-Dec-15 22:37 
AnswerRe: Some better performance with JOINs Pin
Krishna P Seetharaman8-Dec-15 5:00
Krishna P Seetharaman8-Dec-15 5:00 
Praiseyou can use trigger options Pin
LOG.ETI1-Dec-15 4:58
LOG.ETI1-Dec-15 4:58 
GeneralRe: you can use trigger options Pin
Krishna P Seetharaman4-Dec-15 13:33
Krishna P Seetharaman4-Dec-15 13:33 
QuestionCursors vs Joins Pin
Sinisa Hajnal29-Nov-15 23:53
professionalSinisa Hajnal29-Nov-15 23:53 
I would argue that it is obvious you shouldn't use cursors if you can avoid them since databases are made for big data sets manipulations and not procedural logic. But, unfortunately, many developers by default use cursors due to their similarity with programming languages.

I had the same problem transferring raw table from file import to correctly formatted production table (I inherited the logic using cursors). It took 20 minutes for a file (several thousand rows)...after my rewrite it took 2.5 seconds.

AnswerRe: Cursors vs Joins Pin
Krishna P Seetharaman5-Dec-15 19:12
Krishna P Seetharaman5-Dec-15 19:12 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.