Sunday, August 18, 2024

Different ways to delete duplicate records in SQL Server

 Deleting duplicate records in SQL Server can be done using several different approaches depending on the specific scenario and table structure. Here are a few methods:

Choosing the Right Method

  • ROW_NUMBER(): Preferred for larger datasets and complex deduplication logic.
  • GROUP BY: Simple but can be less efficient for very large datasets.
  • JOIN: Useful when working with smaller datasets or where performance isn’t critical.
  • Temporary Table: Best for smaller tables or where you need to reinsert the data.
  • EXISTS: Suitable when needing a more readable query but can be less performant.

These methods allow flexibility depending on the specific requirements of the deduplication task.

1. Using ROW_NUMBER() with Common Table Expressions (CTE)

The most common and efficient way is to use the ROW_NUMBER() function to identify duplicates and then delete them.

WITH CTE AS (
    SELECT 
        *, 
        ROW_NUMBER() OVER (PARTITION BY column1, column2, column3 ORDER BY (SELECT NULL)) AS rn
    FROM 
        your_table
)
DELETE FROM CTE WHERE rn > 1;
  • Explanation: This method assigns a unique row number to each record within a group of duplicates based on the specified columns (column1, column2, column3). All rows except the first in each group are then deleted.

2. Using GROUP BY with HAVING and a Subquery

This method uses GROUP BY to find duplicates and then deletes them using a DELETE statement with a subquery.

DELETE FROM your_table
WHERE ID NOT IN (
    SELECT MIN(ID)
    FROM your_table
    GROUP BY column1, column2, column3
);
  • Explanation: This method keeps the row with the minimum ID (or any unique column) for each group of duplicates and deletes the others.

3. Using INNER JOIN

You can delete duplicates by joining the table on itself.

DELETE T1
FROM your_table T1
INNER JOIN your_table T2
ON 
    T1.column1 = T2.column1 AND 
    T1.column2 = T2.column2 AND 
    T1.column3 = T2.column3 AND 
    T1.ID > T2.ID;
  • Explanation: This deletes rows from T1 that have duplicates in T2 based on the specified columns but with a higher ID.

4. Using Temporary Table

You can insert distinct records into a temporary table, truncate the original table, and reinsert the distinct records.

SELECT DISTINCT * INTO #temp_table FROM your_table;

TRUNCATE TABLE your_table;

INSERT INTO your_table SELECT * FROM #temp_table;

DROP TABLE #temp_table;
  • Explanation: This method creates a temporary table containing only distinct records, then clears the original table and repopulates it with the unique records.

5. Using DELETE with EXISTS

This method uses the EXISTS clause to find and delete duplicates.

DELETE FROM your_table
WHERE EXISTS (
    SELECT 1 
    FROM your_table T2
    WHERE 
        your_table.column1 = T2.column1 AND 
        your_table.column2 = T2.column2 AND 
        your_table.column3 = T2.column3 AND 
        your_table.ID > T2.ID
);
  • Explanation: This method deletes records that have duplicates with a lower ID value.

No comments:

Post a Comment