Click here to Skip to main content
14,931,113 members
Articles / Programming Languages / SQL
Tip/Trick
Posted 14 Aug 2017

Tagged as

Stats

12.4K views
69 downloads
10 bookmarked

Remove Duplicate Rows with CTE

Rate me:
Please Sign up or sign in to vote.
5.00/5 (9 votes)
14 Aug 2017CPOL2 min read
This article gives you quick tip about how you can remove duplicate rows easily using common table expression in SQL

Introduction

Many times, our database contains duplicate rows which can be filtered out using certain columns. This can happen for many reasons like bad data import, wrong SQL execution, bug in the user interface, etc. A developer had to remove these duplicate rows with some heroic query. Here, I am showing a simple way to remove duplicate rows.

Background

Suppose we have a table called EmployeeDesignation which holds employee ID and designation ID. The table may contain records like this:

EmployeeIDDesignationID
111
212
313
111
111
313

Notice this table has duplicate records like (1,11) 3 times and (3,13) 2 times. Now we want to keep only one distinct pair and remove the rest. Let's see how we can do that in the next section.

Using the Code

Let's quickly learn about Common Table Expression (CTE). CTE can be thought of a temporary result set. CTE gives you more readability and simplification of complex queries. A CTE looks like below:

SQL
WITH 
expression_name_1 (columns) AS
(CTE query definition 1)

[, expression_name_X AS
   (CTE query definition X)
 , etc ]

SELECT expression_A, expression_B, ... --User defined statement using CTE expressions
FROM expression_name_1 

CTE starts with the word "With" followed by an expression name and column names. Then the selection query that populates the expression. We can have multiple set of expression separated by comma followed by the final statement. We are not going into details of CTE as this is out of scope of this article. Let's look at how we can use CTE to remove duplicate rows. Take a look at the code below:

SQL
WITH EmpDesg (EMPID, DESGID, OCCURRENCE ) AS
(select EmployeeId,DesignationID, ROW_NUMBER() _
OVER(PARTITION BY EmployeeId, DesignationID ORDER BY EmployeeId) AS rowNumber from EmployeeDesignation)
select * from EmpDesg

In the above SQL, "EmpDesg" is the CTE expression which acts as a temporary view. It has three columns (EMPID, DESGID and OCCURRENCE). In the query definition, we are using Row_Number function and partitioning the table rows with EmployeeID and DesignationID to assign an occurrence number to each pair. The query will return a result like below:

EMPIDDESGIDOCCURRENCE
1111
1112
1113
2121
3131
3132

Now instead of the final select query, we can delete the rows from our temporary result set which has OCCURRENCE > 1.

SQL
WITH EmpDesg (EMPID, DESGID, OCCURRENCE ) AS
(select EmployeeId,DesignationID, ROW_NUMBER() 
 OVER(PARTITION BY EmployeeId, DesignationID ORDER BY EmployeeId) AS rowNumber from EmployeeDesignation)
delete from EmpDesg where OCCURRENCE >1

This will just keep only one occurrence of each Employee ID and Designation ID pair and remove the duplicates from EmployeeDesignation table.

We can think of CTE as a subquery. If we run delete statement against CTE, it will delete rows from the table until SQL can infer which table to update/delete based on CTE. Otherwise, it will give a error.

That's all!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Ehtesam Ahmed
Software Developer CIMSOLUTIONS
Netherlands Netherlands
I am working in the software development industry for more than 6 years. I have had the opportunity to work with clients mostly in UK, Ireland and Netherland. Have worked for several sports clients on their membership solution and also other clients with administrative solution. I have strong knowledge on Microsoft .Net technologies. Love to work with Javascript and have strong knowledge on it. Been working with several javascript libraries like jQuery, knockout.js, Angular, React, backbone.js etc. I was heavily involved with the development of GoMembership, one of the most promising membership solution in UK and expanding in other countries.
Always willing to learn and looking for new challenges. Most importantly I am a simple guy who believes in hard work.

Comments and Discussions

 
GeneralSuperb Tip Pin
Zihadul Islam Rajon26-Nov-17 23:45
MemberZihadul Islam Rajon26-Nov-17 23:45 
GeneralMy vote of 5 Pin
MarcusCole68331-Sep-17 4:12
professionalMarcusCole68331-Sep-17 4:12 
QuestionWhere is the OCCURRENCE column name in select ? Pin
Mou_kol15-Aug-17 22:33
MemberMou_kol15-Aug-17 22:33 
AnswerRe: Where is the OCCURRENCE column name in select ? Pin
Ehtesam Ahmed15-Aug-17 22:58
professionalEhtesam Ahmed15-Aug-17 22:58 
GeneralRe: Where is the OCCURRENCE column name in select ? Pin
Mou_kol16-Aug-17 22:31
MemberMou_kol16-Aug-17 22:31 
GeneralRe: Where is the OCCURRENCE column name in select ? Pin
Ehtesam Ahmed17-Aug-17 0:04
professionalEhtesam Ahmed17-Aug-17 0:04 
GeneralMy vote of 5 Pin
mr. Duan15-Aug-17 15:48
professionalmr. Duan15-Aug-17 15:48 
GeneralMy vote of 5 Pin
Hipolito Lopez15-Aug-17 8:03
MemberHipolito Lopez15-Aug-17 8:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.