Deleting duplicate records in SQL Server can be done using several different approaches depending on the specific scenario and table structure. Here are a few methods:
Choosing the Right Method
- ROW_NUMBER(): Preferred for larger datasets and complex deduplication logic.
- GROUP BY: Simple but can be less efficient for very large datasets.
- JOIN: Useful when working with smaller datasets or where performance isn’t critical.
- Temporary Table: Best for smaller tables or where you need to reinsert the data.
- EXISTS: Suitable when needing a more readable query but can be less performant.
These methods allow flexibility depending on the specific requirements of the deduplication task.
1. Using ROW_NUMBER()
with Common Table Expressions (CTE)
The most common and efficient way is to use the ROW_NUMBER()
function to identify duplicates and then delete them.
WITH CTE AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY column1, column2, column3 ORDER BY (SELECT NULL)) AS rn
FROM
your_table
)
DELETE FROM CTE WHERE rn > 1;
- Explanation: This method assigns a unique row number to each record within a group of duplicates based on the specified columns (
column1
,column2
,column3
). All rows except the first in each group are then deleted.
2. Using GROUP BY
with HAVING
and a Subquery
This method uses GROUP BY
to find duplicates and then deletes them using a DELETE
statement with a subquery.
DELETE FROM your_table
WHERE ID NOT IN (
SELECT MIN(ID)
FROM your_table
GROUP BY column1, column2, column3
);
- Explanation: This method keeps the row with the minimum
ID
(or any unique column) for each group of duplicates and deletes the others.
3. Using INNER JOIN
You can delete duplicates by joining the table on itself.
DELETE T1
FROM your_table T1
INNER JOIN your_table T2
ON
T1.column1 = T2.column1 AND
T1.column2 = T2.column2 AND
T1.column3 = T2.column3 AND
T1.ID > T2.ID;
- Explanation: This deletes rows from
T1
that have duplicates inT2
based on the specified columns but with a higherID
.
4. Using Temporary Table
You can insert distinct records into a temporary table, truncate the original table, and reinsert the distinct records.
SELECT DISTINCT * INTO #temp_table FROM your_table;
TRUNCATE TABLE your_table;
INSERT INTO your_table SELECT * FROM #temp_table;
DROP TABLE #temp_table;
- Explanation: This method creates a temporary table containing only distinct records, then clears the original table and repopulates it with the unique records.
5. Using DELETE
with EXISTS
This method uses the EXISTS
clause to find and delete duplicates.
DELETE FROM your_table
WHERE EXISTS (
SELECT 1
FROM your_table T2
WHERE
your_table.column1 = T2.column1 AND
your_table.column2 = T2.column2 AND
your_table.column3 = T2.column3 AND
your_table.ID > T2.ID
);
- Explanation: This method deletes records that have duplicates with a lower
ID
value.
No comments:
Post a Comment