SQL Tutorial
SQL Database
SQL Table
SQL Select
SQL Order By
SQL Insert
SQL Update
SQL Delete
Difference
SQL Injection
SQL String Functions
Miscl
- SQL Formatter
- SQL group by
- SQL add/drop/update column operation
- SQL CAST Function
- SQL Comments
- SQL CONCAT Function
- CTE (Common Table Expression)SQL
- How to use distinct in SQL?
- Joining Three or More Tables in SQL
- What is Web SQL?
- How to create functions in SQL?
- How to run SQL Script?
- How to Delete Duplicate Rows in SQL?
- Nth Highest salary
- 12 Codd's Rules
- SQL EXCEPT
- Types of SQL JOIN
- Change datatype of column in SQL
- SQL Auto Increment
- SQL Like
- Commit and Rollback in SQL
- SQL Concatenate
- SQL get month from the date
- Savepoint in SQL
- SQL ORDER BY DATE
- TIME Datatype in SQL
- SQL BETWEEN
- CRUD Operations in SQL
- SQL INDEX
- Scalar Functions in SQL
- SET Operators in SQL
- Types of SQL Commands
- TCL Commands in SQL
- SQL Subquery
- SQL View
- Constraints in SQL
- Pattern Matching in SQL
- SQL Date Functions
- DDL Commands in SQL
- DML Commands in SQL
- SQL CASE
- SQL Inner Join
- SQL IN Operator
- Check Constraint in SQL
- SQL CLAUSES
- SQL LOGICAL OPERATORS
- Delete Column from Table
- Add Column in the Table
- Delete one row in SQL
- Change the Column Value
- How to Add Foreign Key in SQL
- Add a Primary Key
- Insert One or More rows
- How to Use LIKE in SQL
- Cursor in SQL
- Difference Between DROP and Truncate
- SQL Comparison Operators
- SQL COUNT WHERE
- SQL SELECT MIN
- SQL Stored Procedure
- SQL SELECT AVG
- SQL SELECT MAX
- SQL ADD COLUMN
- How to use Auto-Increment in SQL
- SQL Languages
- SQL Arithmetic Operators
- How to Use GROUP BY in SQL
- How to Use ORDER BY in SQL
- Trigger in SQL
- What is Race Condition
- SQL COUNT DISTINCT
PL/SQL Tutorial
Sql Interview Question
SQl Quiz
How to Delete Duplicate Rows in SQL?
In this section, we learn different ways to delete duplicate rows in MySQL and Oracle. If the SQL table contains duplicate rows, then we have to remove the duplicate rows.
Preparing sample data
The script creates the table named contacts.
CREATE TABLE contacts (
id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(30) NOT NULL,
last_name VARCHAR(25) NOT NULL,
email VARCHAR(210) NOT NULL,
age VARCHAR(22) NOT NULL
);
In the above table, we have inserted the following data.
VALUES ('Kavin','Peterson','kavin.peterson@verizon.net','21'),
('Nick','Jonas','nick.jonas@me.com','18'),
('Peter','Heaven','peter.heaven@google.com','23'),
('Michal','Jackson','michal.jackson@aol.com','22'),
('Sean','Bean','sean.bean@yahoo.com','23'),
('Tom ','Baker','tom.baker@aol.com','20'),
('Ben','Barnes','ben.barnes@comcast.net','17'),
('Mischa ','Barton','mischa.barton@att.net','18'),
('Sean','Bean','sean.bean@yahoo.com','16'),
('Eliza','Bennett','eliza.bennett@yahoo.com','25'),
('Michal','Krane','michal.Krane@me.com','25'),
('Peter','Heaven','peter.heaven@google.com','20'),
('Brian','Blessed','brian.blessed@yahoo.com','20');
('Kavin','Peterson','kavin.peterson@verizon.net','30'),
We execute the script to recreate test data after executing a DELETE statement.
The query returns data from the contacts table:
ORDER BY email;
id | first_name | last_name | age | |
7 | Ben | Barnes | ben.barnes@comcast.net | 21 |
13 | Brian | Blessed | brian.blessed@yahoo.com | 18 |
10 | Eliza | Bennett | eliza.bennett@yahoo.cm | 23 |
1 | Kavin | Peterson | kavin.peterson@verizon.net | 22 |
14 | Kavin | Peterson | kavin.peterson@verizon.net | 23 |
8 | Mischa | Barton | mischa.barton@att.net | 20 |
11 | Michal | Krane | michal.Krane@me.com | 17 |
4 | Michal | Jackson | Michal.jackson@aol.com | 18 |
2 | Nick | Jonas | nick.jonas@me.com | 16 |
3 | Peter | Heaven | Peter.heaven@google.com | 25 |
12 | Peter | Heaven | Peter.heaven@google.com | 25 |
5 | Sean | Bean | Sean.bean@yahoo.com | 20 |
9 | Sean | Bean | Sean.bean@yahoo.com | 20 |
6 | Tom | Baker | tom.baker@aol.com | 30 |
The following SQL query returns the duplicate emails from the contact table:
email, COUNT(email)
FROM
contacts
GROUP BY
HAVING
COUNT (email) > 1;
COUNT(email) | |
kavin.peterson@verizon.net | 2 |
Peter.heaven@google.com | 2 |
Sean.bean@yahoo.com | 2 |
We have three rows with duplicate emails.
(A) Delete duplicate rows with the DELETE JOIN statement
INNERJOIN contacts t2
WHERE
t1.id < t2.id AND
t1.email = t2.email;
Output:
Three rows had been deleted. We execute the query, given below to finds the duplicate emails from the table.
email,
COUNT (email)
FROM
contacts
GROUP BY
HAVING
COUNT (email) > 1;
The query returns the empty set. To verify the data from the contacts table, execute the following SQL query:
id | first_name | last_name | age | |
7 | Ben | Barnes | ben.barnes@comcast.net | 21 |
13 | Brian | Blessed | brian.blessed@yahoo.com | 18 |
10 | Eliza | Bennett | eliza.bennett@yahoo.cm | 23 |
1 | Kavin | Peterson | kavin.peterson@verizon.net | 22 |
8 | Mischa | Barton | mischa.barton@att.net | 20 |
11 | Micha | Krane | michal.Krane@me.com | 17 |
4 | Michal | Jackson | Michal.jackson@aol.com | 18 |
2 | Nick | Jonas | nick.jonas@me.com | 16 |
3 | Peter | Heaven | Peter.heaven@google.com | 25 |
5 | Sean | Bean | Sean.bean@yahoo.com | 20 |
6 | Tom | Baker | tom.baker@aol.com | 30 |
The rows id's 9, 12, and 14 have been deleted. We use the below statement to delete the duplicate rows:
Execute the script for creating the contact.
INNERJ OIN contacts c2
WHERE
c1.id > c2.id AND
c1.email = c2.email;
id | first_name | last_name | age | |
1 | Ben | Barnes | ben.barnes@comcast.net | 21 |
2 | Kavin | Peterson | kavin.peterson@verizon.net | 22 |
3 | Brian | Blessed | brian.blessed@yahoo.com | 18 |
4 | Nick | Jonas | nick.jonas@me.com | 16 |
5 | Michal | Krane | michal.Krane@me.com | 17 |
6 | Eliza | Bennett | eliza.bennett@yahoo.cm | 23 |
7 | Michal | Jackson | Michal.jackson@aol.com | 18 |
8 | Sean | Bean | Sean.bean@yahoo.com | 20 |
9 | Mischa | Barton | mischa.barton@att.net | 20 |
10 | Peter | Heaven | Peter.heaven@google.com | 25 |
11 | Tom | Baker | tom.baker@aol.com | 30 |
(B) Delete duplicate rows using an intermediate table
To delete a duplicate row by using the intermediate table, follow the steps given below:
Step 1. Create a new table structure, same as the real table:
Step 2. Insert the distinct rows from the original schedule of the database:
SELECT * FROM source
GROUP BY col;
Step 3. Drop the original table and rename the immediate table to the original one.
ALTER TABLE source_copy RENAME TO source;
For example, the following statements delete the rows with duplicate emails from the contacts table:
CREATE TABLE contacts_temp
LIKE contacts;
-- step 2
INSERT INTO contacts_temp
SELECT * FROM contacts
GROUP BY email;
-- step 3
DROP TABLE contacts;
ALTER TABLE contacts_temp
RENAME TO contacts;
(C) Delete duplicate rows using the ROW_NUMBER() Function
The following statement uses the ROW_NUMBER () to assign a sequential integer to every row. If the email is duplicate, the row will higher than one.
OVER (PARTITION BY email
ORDER BY email
) AS row_num
FROM contacts;
The following SQL query returns id list of the duplicate rows:
FROM (SELECT id,
ROW_NUMBER() OVER (
PARTITION BY email ORDER BY email) AS row_num
FROM
contacts
) t
WHERE
row_num> 1;
Output:
id |
9 |
12 |
14 |
Delete Duplicate Records in Oracle
When we found the duplicate records in the table, we had to delete the unwanted copies to keep our data clean and unique. If a table has duplicate rows, we can delete it by using the DELETE statement.
In the case, we have a column, which is not the part of group used to evaluate the duplicate records in the table.
Consider the table given below:
VEGETABLE_ID | VEGETABLE_NAME | COLOR |
01 | Potato | Brown |
02 | Potato | Brown |
03 | Onion | Red |
04 | Onion | Red |
05 | Onion | Red |
06 | Pumpkin | Green |
07 | Pumpkin | Yellow |
CREATE TABLE vegetables (
VEGETABLE_ID NUMBER generated BY DEFAULT AS ID ENTITY,
VEGETABLE_NAME VARCHAR2(100),
color VARCHAR2(20),
PRIMARY KEY (VEGETABLE_ID)
);
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Green');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Yellow');
SELECT * FROM vegetables;
Suppose, we want to keep the row with the highest VEGETABLE_ID and delete all other copies.
MAX (VEGETABLE_ID)
FROM
vegetables
GROUP BY
VEGETABLE_NAME,
color
ORDER BY
MAX(VEGETABLE_ID);
MAX(VEGETABLE_ID) |
2 |
5 |
6 |
7 |
We use the DELETE statement to delete the rows whose values in the VEGETABLE_ID COLUMN are not the highest.
vegetables
WHERE
VEGETABLE_IDNOTIN
(
SELECT
MAX(VEGETABLE_ID)
FROM
vegetables
GROUP BY
VEGETABLE_NAME,
color
);
Three rows have been deleted.
VEGETABLE_ID | VEGETABLE_NAME | COLOR |
02 | Potato | Brown |
05 | Onion | Red |
06 | Pumpkin | Green |
07 | Yellow |
If we want to keep the row with the lowest id, use the MIN() function instead of the MAX() function.
vegetables
WHERE
VEGETABLE_IDNOTIN
(
SELECT
MIN(VEGETABLE_ID)
FROM
vegetables
GROUP BY
VEGETABLE_NAME,
color
);
The above method works if we have a column that is not part of the group for evaluating duplicate. If all values in the columns have copies, then we cannot use the VEGETABLE_ID column.
Let's drop and create the vegetable table with a new structure.
CREATE TABLE vegetables (
VEGETABLE_ID NUMBER,
VEGETABLE_NAME VARCHAR2(100),
Color VARCHAR2(20)
);
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(1, 'Potato','Brown');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(2,'Onion','Red');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(3,'Pumpkin','Green');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES('4,Pumpkin','Yellow');
SELECT * FROM vegetables;
VEGETABLE_ID | VEGETABLE_NAME | COLOR |
01 | Potato | Brown |
01 | Potato | Brown |
02 | Onion | Red |
02 | Onion | Red |
02 | Onion | Red |
03 | Pumpkin | Green |
04 | Pumpkin | Yellow |
In the vegetable table, the values in all columns VEGETABLE_ID, VEGETABLE_NAME, and color have been copied.
We can use the rowid, a locator that specifies where Oracle stores the row. Because the rowid is unique so that we can use it to remove the duplicates rows.
FROM
Vegetables
WHERE
rowed NOT IN
(
SELECT
MIN(rowid)
FROM
vegetables
GROUP BY
VEGETABLE_ID,
VEGETABLE_NAME,
color
);
The query verifies the deletion operation:
VEGETABLE_ID | VEGETABLE_NAME | COLOR |
01 | Potato | Brown |
02 | Onion | Red |
03 | Pumpkin | Green |
04 | Pumpkin | Yellow |