Dark Mode
Image

SQL Tutorial

SQL Database

SQL Injection

PL/SQL Tutorial

Sql Interview Question

SQl Quiz

How to Delete Duplicate Rows in SQL?

In this section, we learn different ways to delete duplicate rows in MySQL and Oracle. If the SQL table contains duplicate rows, then we have to remove the duplicate rows.

Preparing sample data

The script creates the table named contacts.

DROP TABLE IF EXISTS contacts;  
CREATE TABLE contacts (  
id INT PRIMARY KEY AUTO_INCREMENT,  
first_name VARCHAR(30) NOT NULL,  
last_name VARCHAR(25) NOT NULL,   
    email VARCHAR(210) NOT NULL,  
    age VARCHAR(22) NOT NULL  
);  

In the above table, we have inserted the following data.

INSERT INTO contacts (first_name,last_name,email,age)   
VALUES ('Kavin','Peterson','kavin.peterson@verizon.net','21'),  
       ('Nick','Jonas','nick.jonas@me.com','18'),  
       ('Peter','Heaven','peter.heaven@google.com','23'),  
       ('Michal','Jackson','michal.jackson@aol.com','22'),  
       ('Sean','Bean','sean.bean@yahoo.com','23'),  
       ('Tom ','Baker','tom.baker@aol.com','20'),  
       ('Ben','Barnes','ben.barnes@comcast.net','17'),  
       ('Mischa ','Barton','mischa.barton@att.net','18'),  
       ('Sean','Bean','sean.bean@yahoo.com','16'),  
       ('Eliza','Bennett','eliza.bennett@yahoo.com','25'),  
       ('Michal','Krane','michal.Krane@me.com','25'),  
       ('Peter','Heaven','peter.heaven@google.com','20'),  
       ('Brian','Blessed','brian.blessed@yahoo.com','20');  
       ('Kavin','Peterson','kavin.peterson@verizon.net','30'),   

We execute the script to recreate test data after executing a DELETE statement.

The query returns data from the contacts table:

SELECT * FROM contacts  
ORDER BY email;  

 

id first_name last_name Email age
7 Ben Barnes ben.barnes@comcast.net 21
13 Brian Blessed brian.blessed@yahoo.com 18
10 Eliza Bennett eliza.bennett@yahoo.cm 23
1 Kavin Peterson kavin.peterson@verizon.net 22
14 Kavin Peterson kavin.peterson@verizon.net 23
8 Mischa Barton mischa.barton@att.net 20
11 Michal Krane michal.Krane@me.com 17
4 Michal Jackson Michal.jackson@aol.com 18
2 Nick Jonas nick.jonas@me.com 16
3 Peter Heaven Peter.heaven@google.com 25
12 Peter Heaven Peter.heaven@google.com 25
5 Sean Bean Sean.bean@yahoo.com 20
9 Sean Bean Sean.bean@yahoo.com 20
6 Tom Baker tom.baker@aol.com 30

The following SQL query returns the duplicate emails from the contact table:

SELECT  
    email, COUNT(email)  
FROM  
    contacts  
GROUP BY  
    email  
HAVING  
COUNT (email) > 1;  
email COUNT(email)
kavin.peterson@verizon.net 2
Peter.heaven@google.com 2
Sean.bean@yahoo.com 2

We have three rows with duplicate emails.

(A) Delete duplicate rows with the DELETE JOIN statement

DELETE t1 FROM contacts t1  
INNERJOIN contacts t2   
WHERE  
    t1.id < t2.id AND  
    t1.email = t2.email;

 Output:

Query OK, three rows affected (0.10 sec)

Three rows had been deleted. We execute the query, given below to finds the duplicate emails from the table.

SELECT  
    email,   
COUNT (email)  
FROM  
    contacts  
GROUP BY  
    email  
HAVING  
COUNT (email) > 1;   

The query returns the empty set. To verify the data from the contacts table, execute the following SQL query:

SELECT * FROM contacts;  

 

id first_name last_name Email age
7 Ben Barnes ben.barnes@comcast.net 21
13 Brian Blessed brian.blessed@yahoo.com 18
10 Eliza Bennett eliza.bennett@yahoo.cm 23
1 Kavin Peterson kavin.peterson@verizon.net 22
8 Mischa Barton mischa.barton@att.net 20
11 Micha Krane michal.Krane@me.com 17
4 Michal Jackson Michal.jackson@aol.com 18
2 Nick Jonas nick.jonas@me.com 16
3 Peter Heaven Peter.heaven@google.com 25
5 Sean Bean Sean.bean@yahoo.com 20
6 Tom Baker tom.baker@aol.com 30

The rows id's 9, 12, and 14 have been deleted. We use the below statement to delete the duplicate rows:

Execute the script for creating the contact.

DELETE c1 FROM contacts c1  
INNERJ OIN contacts c2   
WHERE  
    c1.id > c2.id AND  
    c1.email = c2.email;  

 

id first_name last_name email age
1 Ben Barnes ben.barnes@comcast.net 21
2 Kavin Peterson kavin.peterson@verizon.net 22
3 Brian Blessed brian.blessed@yahoo.com 18
4 Nick Jonas nick.jonas@me.com 16
5 Michal Krane michal.Krane@me.com 17
6 Eliza Bennett eliza.bennett@yahoo.cm 23
7 Michal Jackson Michal.jackson@aol.com 18
8 Sean Bean Sean.bean@yahoo.com 20
9 Mischa Barton mischa.barton@att.net 20
10 Peter Heaven Peter.heaven@google.com 25
11 Tom Baker tom.baker@aol.com 30

(B) Delete duplicate rows using an intermediate table

To delete a duplicate row by using the intermediate table, follow the steps given below:

Step 1. Create a new table structure, same as the real table:

CREATE TABLE source_copy LIKE source;  

Step 2. Insert the distinct rows from the original schedule of the database:

INSERT INTO source_copy  
SELECT * FROM source  
GROUP BY col;    

Step 3. Drop the original table and rename the immediate table to the original one.

DROP TABLE source;  
ALTER TABLE source_copy RENAME TO source;   

For example, the following statements delete the rows with duplicate emails from the contacts table:

-- step 1  
CREATE TABLE contacts_temp  
LIKE contacts;  
  
-- step 2  
INSERT INTO contacts_temp  
SELECT * FROM contacts   
GROUP BY email;  
  
-- step 3  
DROP TABLE contacts;  
  
ALTER TABLE contacts_temp  
RENAME TO contacts;   

(C) Delete duplicate rows using the ROW_NUMBER() Function

Note: The ROW_NUMBER() function has been supported since MySQL version 8.02, so we should check our MySQL version before using the function.

The following statement uses the ROW_NUMBER () to assign a sequential integer to every row. If the email is duplicate, the row will higher than one.

SELECT id, email, ROW_NUMBER()   
OVER (PARTITION BY email   
ORDER BY email  
    ) AS row_num  
FROM contacts;  

 The following SQL query returns id list of the duplicate rows:

SELECT id  
FROM (SELECT id,  
ROW_NUMBER() OVER (  
PARTITION BY email ORDER BY email) AS row_num  
FROM  
contacts  
) t  
WHERE  
row_num> 1;

Output:

id
9
12
14

Delete Duplicate Records in Oracle

When we found the duplicate records in the table, we had to delete the unwanted copies to keep our data clean and unique. If a table has duplicate rows, we can delete it by using the DELETE statement.

In the case, we have a column, which is not the part of group used to evaluate the duplicate records in the table.

Consider the table given below:

VEGETABLE_ID VEGETABLE_NAME COLOR
01 Potato Brown
02 Potato Brown
03 Onion Red
04 Onion Red
05 Onion Red
06 Pumpkin Green
07 Pumpkin Yellow
 -- create the vegetable table  
CREATE TABLE vegetables (  
VEGETABLE_ID NUMBER generated BY DEFAULT AS ID ENTITY,  
VEGETABLE_NAME VARCHAR2(100),  
color VARCHAR2(20),  
        PRIMARY KEY (VEGETABLE_ID)  
);  

 

-- insert sample rows  
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown');  
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown');  
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');  
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');  
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');  
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Green');  
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Yellow');   

 

-- query data from the vegetable table  
SELECT * FROM vegetables;   

Suppose, we want to keep the row with the highest VEGETABLE_ID and delete all other copies.

SELECT  
MAX (VEGETABLE_ID)  
FROM  
vegetables  
GROUP BY  
VEGETABLE_NAME,  
color  
ORDER BY  
MAX(VEGETABLE_ID);   
MAX(VEGETABLE_ID)
2
5
6
7

We use the DELETE statement to delete the rows whose values in the VEGETABLE_ID COLUMN are not the highest.

DELETE FROM  
vegetables  
WHERE  
VEGETABLE_IDNOTIN  
  (  
SELECT  
MAX(VEGETABLE_ID)  
FROM  
vegetables  
GROUP BY  
VEGETABLE_NAME,  
color  
);  

 Three rows have been deleted.

SELECT *FROM vegetables;  
VEGETABLE_ID VEGETABLE_NAME COLOR
02 Potato Brown
05 Onion Red
06 Pumpkin Green
07   Yellow

If we want to keep the row with the lowest id, use the MIN() function instead of the MAX() function.

DELETE FROM  
vegetables  
WHERE  
VEGETABLE_IDNOTIN  
  (  
SELECT  
MIN(VEGETABLE_ID)  
FROM  
vegetables  
GROUP BY  
VEGETABLE_NAME,  
color  
  );  

 

The above method works if we have a column that is not part of the group for evaluating duplicate. If all values in the columns have copies, then we cannot use the VEGETABLE_ID column.

Let's drop and create the vegetable table with a new structure.

DROP TABLE vegetables;  
CREATE TABLE vegetables (  
VEGETABLE_ID NUMBER,  
VEGETABLE_NAME VARCHAR2(100),  
Color VARCHAR2(20)  
);  
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(1,'Potato','Brown');  
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(1, 'Potato','Brown');  
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red');  
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red');  
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(2,'Onion','Red');  
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(3,'Pumpkin','Green');  
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES('4,Pumpkin','Yellow');  
  
SELECT * FROM vegetables;   
VEGETABLE_ID VEGETABLE_NAME COLOR
01 Potato Brown
01 Potato Brown
02 Onion Red
02 Onion Red
02 Onion Red
03 Pumpkin Green
04 Pumpkin Yellow

In the vegetable table, the values in all columns VEGETABLE_ID, VEGETABLE_NAME, and color have been copied.

We can use the rowid, a locator that specifies where Oracle stores the row. Because the rowid is unique so that we can use it to remove the duplicates rows.

DELETE  
FROM  
Vegetables  
WHERE  
rowed NOT IN  
(  
SELECT  
MIN(rowid)  
FROM  
vegetables  
GROUP BY  
VEGETABLE_ID,  
VEGETABLE_NAME,  
color  
  );  

 

The query verifies the deletion operation:

SELECT * FROM vegetables;  

 

VEGETABLE_ID VEGETABLE_NAME COLOR
01 Potato Brown
02 Onion Red
03 Pumpkin Green
04 Pumpkin Yellow

Comment / Reply From