Introduction
Duplicate records in database always create trouble while accessing the data from the warehouse. The duplication causes inconsistency and inaccuracy of data stored in the SQL Database Server.
Often, user encounters such problems while performing Extraction, Transformation, loading and executing the online transaction on the data.
In order to make the database table consistent and accurate, one needs to remove duplicate records from SQL server table.
In this article, we will discuss the different methods to resolve the problem of duplication from a database, which results in the accuracy of data.
Remove The Duplicity From SQL Server Table
User can deploy different strategies for removing duplicate records in SQL Server table. By using given below solutions, user can get rid of inconsistency that exists in database.
By Creating Temporary Tables:
In this strategy user needs to create a temporary table and pull the all distinct records from target table to temporary table. Also, truncate the target table and finally insert the values from the temporary table to target table.
For executing this procedure there should be ample space to store distinct values in temporary table if storing the huge size of data. In addition, user requires permission for object truncation.
SELECT DISTINCT * INTO #Temp_table FROM Targt_table
TRUNCATE TABLE Targt_table
INSERT INTO Targt_table SELECT * FROM Temp_table
DROP TABLE Temp_table
Deleting Duplicate Items Using Correlated Subquery:
User can go for correlated subquery to get rid of duplication records issue in SQL server table. In this strategy the output of outer query is evaluated. The result of outer query is used by inner query and the outcome of inner sub query is again used by outer query.
In this user need to execute SELECT command to select the records from the table and needs to apply WHERE subquery that will gives result to SELECT outer query. After this one needs to remove duplicates using DELETE command.
CREATE TABLE Student
(
[Roll no] INT
[FirstName] Varchar(50),
[LastName] Varchar(50),
)
GO
INSERT INTO Employee([ID],[FirstName],[LastName])
VALUES ('12','Riya', 'Core')
INSERT INTO Employee([ID],[FirstName],[LastName])
VALUES ('12','Riya', 'Core')
INSERT INTO Employee([ID],[FirstName],[LastName])
VALUES ('14','John', 'More')
INSERT INTO Employee([ID],[FirstName],[LastName])
VALUES ('18','Jiya', 'sene')
INSERT INTO Employee([ID],[FirstName],[LastName])
VALUES ('12','Riya', 'Core')
GO
SELECT* FROM Student
GO
SELECT* FROM Student S1 (For selecting the Distinct Record)
WHERE S1.Roll no =(SELECT MAX(Roll no) FROM Student S2
WHERE S2.FirstName=S1.FirstName AND S1.LastName= S2LastName
GO
DELETE Student (Deleting the Duplicates)
WHERE Roll no< ( SELECT MAX(Roll no) FROM Student S2
WHERE S2.FirstName=Student.FirstName AND S2.LastName=Student.LastName
GO
SELECT * FROM Student
GO
Fuzzy Group of Transformation
When using SQL Server Integration Service to upload the data in target table, user can use Fuzzy Group Transformation. This will perform cleaning task by identifying the duplicates and only approve only one row of data to make database consistent.
To use this transformation, it needs a connection to SQL server. Firstly, It creates temporary table which has all component of original table then it applies Fuzzy Algorithm for scanning and to find out similar elements.
After scanning is completed, it will direct unique rows to the destination table.
Using Merge Statement
User can perform UPDATE/INSERT/DELETE in single statement by using MERGE SQL Command. By using statement, one can effectively handle the data present in warehouse. It can check whether the raw exist in the table and then accordingly execute the INSERT, UPDATE and DELETE command in a single statement.
With the MERGE statement user can merge data from source table to target table with respective to the condition that user has specified.
User can adopt any above-mentioned method from different strategies for removing duplicate records from SQL server table in an effective way.
EmoticonEmoticon