SQL Server 2012 :: Finding Duplicates And Show Original / Duplicate Record
Nov 3, 2014
There are many duplicate records on my data table because users constantly register under two accounts. I have a query that identify the records that have a duplicate, but it only shows one of the two records, and I need to show the two records so that I can reconcile the differences.The query is taken from a post on stack overflow. It gives me 196, but I need to see the 392 records.
How to identify the duplicates and show the tow records without having to hard code any values, so I can use the query in a report, and anytime there are new duplicates, the report shows them.
I am trying to complete an insert from query but the problem is I have duplicates, so I'm getting an error message. So to correct it I am creating a Find Duplicates statement in the Query analyzer but Its not working can someone tell me whats wrong with this statement (by the way I'm in SQL 2000 Server)
SELECT EmployeeGamingLicense [TM#]AS [TM# Field], Count([TM#])AS NumberOfDups FROM TERMINATION GROUP BY [TM#] HAVING Count([TM#])>1; GO
I was just wondering if any one knows how to find duplicate keys using more than one field. I used the below key to find those people who exists in list1 but don't exists in list2. I realized that the results had some duplicates which was expected but how do I then find all those duplicate people. I know how to do it if there was a primary key present I would have done a count (distinct cardnumber) > 1 and i would have done the select statement like this distinct cardnumber, but how do I do it with more that one key??
I have a company table and I would like to write a query that will return tome any duplicate companies. However, it is a little more complicated thenjust matching on exact company names. I would like it to give me duplicateswhere x number of letters at the beginning of the company name match AND xnumber of letters of the address match AND x number of letters of the citymatch. I will be doing this in batches based on the first letter of thecompany name. So for example I will first process all companies that startwith the letter "A".So for all "A" companies I want to find companies where the first 5 lettersin the company name match and the first 5 characters of the address fieldmatch and the first 5 characters of the city match. THANKS!!!
I'm trying to find duplicates. Any help will be greatly appreciated! Here's my query:
I'm trying to find only the records where the "Sales_Header_Your_Reference" field is used more then once with a different "Sales Header Document ID" So it looks like this Sales Header Document ID Sales_Header_Your_Reference 1955718 0002377729 2082721 0002478945 2082728 0002478976 2093598 0002487318 2093599 0002487318 2093601 0002487332 I hope im clear enough, here is my query. I get then listed right, but the count is all wrong, I only want to show the ones , like the one i have highlighted.
Select COUNT(Sales_Header_Your_Reference),Tbl_Sales_Header.Sales_Header_Document_Id,Tbl_Sales_Header.Sales_Header_Your_Reference FROM dbo.Tbl_Sales_Header INNER JOIN dbo.Tbl_Sales_Invoice_Header ON dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = dbo.Tbl_Sales_Invoice_Header.Sales_Invoice_Header_Bill_Customer_No AND dbo.Tbl_Sales_Header.Sales_Header_Order_DateTime = dbo.Tbl_Sales_Invoice_Header.Sales_Invoice_Header_Order_Datetime WHERE (dbo.Tbl_Sales_Header.Sales_Header_Order_DateTime > CONVERT(DATETIME, '2007-09-15 00:00:00', 102)) AND (dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'Butler') OR (dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'MWI') OR (dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'NLS') OR (dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'HSI') OR (dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'NLSMVD') OR (dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'RNEXT') GROUP by Tbl_Sales_Header.Sales_Header_Document_Id,Tbl_Sales_Header.Sales_Header_Your_Reference HAVING COUNT(Sales_Header_Your_Reference) > 1 order by Tbl_Sales_Header.Sales_Header_Document_Id, Tbl_Sales_Header.Sales_Header_Your_Reference
I am trying to move distinct data from one table to another based on two columns regardless of special characters such as : ( ) ; ' " / - _ = >< etc. The two columns that are the defining factors that I need to match as duplicates are Col1 & Col3, the rest I want to get the maximum data from each column for that row (I hope this made sense).
Here is what I have tried, but it does not seem to work:
select ID, Col1, max(Col2)as Col2, Col3, max(Col4)as Col4, max(Col5)as Col5, max(Col6)as Col6, max(Col7)as Col7, max(Col8)as Col8, max(Col9)as Col9 where patindex('%[^-:]%',Col1) = 0 AND patindex('%[^-:]%',Col3) = 0 into Newtable from dbo.CLEANING_bk group by Col1,Col3
Basically if there are 10 rows of duplicate data based on Col1 & Col3 (regardless of special characters), I want to move the distinct info to a new table with the maximum amount of info from the other columns of the duplicate rows.
If row 3, 4,5, & 6 are all considered duplicate and Row3/Col2 has info in it that row 4 & 5/Col2 does not, I want to combine it with the single row I export to the new table retaining as much information from the rows of the duplicates as possible.
Can anyone help and let me know why my script is not wanting to play nice?
mytable fld1 int primkey fld2 varchar(20), fld3 varchar(20)From the definition of the above table, how do i do i modify my below query to only select the rows with duplicates in fld2. I know a groupby with a having count will display the duplicates for a given field, but i want my query to see all the fields and rows that are duplicates. select fld1, fld2, fld3 from mytable
I'm trying to create a report in Sql Server 2005 Reporting Services where it will display duplicates rows in the reports. I'm unable to find the property that enables this. I would need something like HideDuplicates = false, but that is not the way it is intended to be used. If HideDuplicates is left blank, it seems to still hide duplicate values if the preceding row matches.
The report is a crosstab type, so we are using a Matrix for the report item. I'll illustrate with a simple example:
This is what we currently get:
Catagory Class Detail
Type 1 A Item 1
B Item 2
Type 2 A Item 4
B Item 5
This is what we want:
Catagory Class Detail
Type 1 A Item 1
Type 1 B Item 2
Type 1 B Item 3
Type 2 A Item 4
Type 2 B Item 5
I'm sure its something simple, but we just cant seem to find the solution,
I have this 40,000,000 rows table... I am trying to clean this 'Contacts' table since I know there are a lot of duplicates.
At first, I wanted to get a count of how many there are.
I need to compare records where these fields are matched:
MATCHED: (email, firstname) but not MATCH: (lastname, phone, mobile). MATCHED: (email, firstname, mobile) But not MATCH: (lastname, phone) MATCHED: (email, firstname, lastname) But not MATCH: (phone, mobile)
I have a client who needs to copy an existing sale. The problem isthe Sale is made up of three tables: Sale, SaleEquipment, SaleParts.Each sale can have multiple pieces of equipment with correspondingparts, or parts without equipment. My problem in copying is when I goto copy the parts, how do I get the NEW sale equipment ids updatedcorrectly on their corresponding parts?I can provide more information if necessary.Thank you!!Maria
CREATE TABLE #Names ( ID INT IDENTITY(1,1), NAME VARCHAR(100) ) INSERT INTO #Names VALUES ('S-SQLXX') INSERT INTO #Names VALUES ('S-SQLXX.NA.SN.ORG') INSERT INTO #Names VALUES ('S-SQLYY') INSERT INTO #Names VALUES ('S-SQLYY.NA.SN.ORG') INSERT INTO #Names VALUES ('S-SQLCL-HR') INSERT INTO #Names VALUES ('S-SQLCL-MIS') SELECT * FROM #Names
--I want to filter out S-SQLXX.NA.SN.ORG because S-SQLXX.NA.SN.ORG is a duplicate of S-SQLXX eliminating .NA.SN.ORG from it.
--I want to filter out S-SQLYY.NA.SN.ORG because S-SQLYY.NA.SN.ORG is a duplicate of S-SQLYY eliminating .NA.SN.ORG from it.
--However I want to keep S-SQLCL-HR and S-SQLCL-MIS in my list of names as they do not have .NA.SN.ORG as a part of their name
--I want ONLY these returned IN the SELECT
SELECT * FROM #Names WHERE ID IN (1,3,5,6) DROP TABLE #Names
I have a table employee_test having the sample data. The rows with EmployeeID=6 are duplicate rows. I want to delete the duplicates retaining one row for the employeeid=6. Note :- I don't want to use a temporary table. I want to do this using a single query or at the most in a SP query batch. Please advise.
I am having problems with the "Duplicate key was ignored" warning message. The problem is that the message seems to happen randomly and cannot be reproduced. If i take the same set of data and run the stored procedure that causes the problem i don't get the warning message a second or subsequent time. Also all the SELECT statements have criteria set to remove duplicates before they are inserted into the tables.
I have a data feed that pulls data from a DB2 database to a SQL Server 2008 staging table as a flattened set of records. A stored procedure in SQL Server is run to load the data into the destination tables. The data feed is run hourly for new and updated records in DB2 Monday-Friday 09:00-17:00 and then there is a midnight run of all the records going back for the last 12 months.
The data feed was originally sent from DB2 as a CSV file and pulled into SQL Server using SSIS but is now an Informatica workflow that pulls the data directly from DB2.
It is the Informatica workflow that is returning the "duplicate key was ignored" warning message and this stops the workflow. The workflow is restarted and the data is always loaded the second time without the warning message. The warning does not happen every time the workflow is run - it can run for a number of days with no warnings and then one will come through I can see in Profiler that it is SQL Server that returns the Duplicate key was ignored warning message so it is not an issue with Informatica.
I cannot reproduce the problem to get to the root cause of the issue. I would expect that if i run the same set of data through the stored procedure i would get the warning message every time, but this is not the case. Even when i step through the stored procedure i do not get the message. As the midnight data feed returns the records from the last 12 months, so by definition would include duplicates, the warning message only appears randomly and is not consistent.
I searched for all the posts which covered my question - but none were close enough to answer what i'm trying to do. Basically, the scenario is thus;
Table1 contains values for UserID, Account code, and Date.
My query (below) is trying to find all the accounts assigned to a particular user ID, but also those duplicate account codes which belong to a second user ID. The date column would be appended to the result set.
The query I'm using is as follows;
select acccountcode, userid, date from dbo.table1 where exists (select accountcode from dbo.table1 where accountcode = table1.accountcode group by accountcode having count(*) > 1) and userid = 'x-x-x' order by accountcode
What I think this produces is a list of all files where a duplicate exists, but of course it leaves out the 2nd UserID...which is crucial.
Hopefully this makes sense. Any insight my fellow DBA's can share would be greatly appreciated!
I need to be able to identify breaks in a sequence so I can evaluate the data more correctly. In the sample I have given I need to be able to identify the break in sequence at 69397576, ideally I would set that as a D. My query also needs to recognize that the 3 sequences following 69397576 are sequential and would belong to that set. so the out come would look like this.
SELECT * INTO TEMP FROM (SELECT 'AAAAA' AS CATEGORY, 'A1000' AS CODE, '01-01-2014' AS STARTDATE, '01-31-2014' AS ENDDATE UNION SELECT 'AAAAA' AS CATEGORY, 'A1000' AS CODE, '02-01-2014' AS STARTDATE, '02-28-2014' AS ENDDATE UNION SELECT 'AAAAA' AS CATEGORY, 'A1000' AS CODE, '03-01-2014' AS STARTDATE, '03-31-2014' AS ENDDATE UNION SELECT 'AAAAA' AS CATEGORY, 'A2000' AS CODE, '04-01-2014' AS STARTDATE, '04-30-2014' AS ENDDATE UNION SELECT 'AAAAA' AS CATEGORY, 'A1000' AS CODE, '05-01-2014' AS STARTDATE, '05-31-2014' AS ENDDATE) X
I need to extract the date that the value in CODE column changes to another code for each value of CATEGORY and if there is no change, to record the original CODE value and its startdate for each CATEGORY.
INSERT #Visits (OpportunityID, ActivityID, FirstVisit, ScheduledEnd) SELECT 1, 1001, '2014-08-17', '2014-08-17 12:00:00.000' UNION ALL SELECT 1, 1002, '2014-08-17', '2014-08-17 17:04:13.000' UNION ALL SELECT 2, 1003, '2014-08-18', '2014-08-18 20:39:56.000' UNION ALL
Basically I'd like to mark the first Activity for each OpportunityID as a First Visit if its ScheduledEnd falls on the same day as the FirstVisit, and otherwise mark it as a Repeat Visit.
I have this so far, but it doesn't pick up on that the ScheduledEnd needs to be on the same day as the FirstVisit date to count as a first visit:
SELECT*, CASE MIN(ScheduledEnd) OVER (PARTITION BY FirstVisit) WHEN ScheduledEnd THEN 1 ELSE 0 END AS isFirstVisit, CASE MIN(ScheduledEnd) OVER (PARTITION BY FirstVisit) WHEN ScheduledEnd THEN 0 ELSE 1 END AS isRepeatVisit FROM#Visits
I am putting my problem in an example as I Feel it would be clear.
Assume my table PEOPLE is having 4 columns with 6 rows, the SlNo being primary key. SlNo Name LastName birthdate 1 A B x -- 2 C B x |-- 1 pair (A, B, x) 3 D E y --|------------ 4 A E y | | 5 A B x __| |-- 2'nd pair (D, E, y) 6 D E y --------------- In this scenario, I need to find SlNo values having similar values in other columns. The o/p for above must be: 1 5 0 3 6 0 (0 needs to include in output for distinction in the sets)
(a)IS THIS POSSIBLE TO DO IN ONE SELECT STATEMET? and HOW? (b)If I create another temp table tempPEOPLE and select distinct row information of the 2'nd, 3'rd and 4'th columns from the PEOPLE table and then selecting SlNo's where the information match, I am able to get o/p 1 5 3 6 without 0...and I cannot makeout the distinct sets in this. HOW DO I FIND THE DISTINCTION IN SETS?
I have a problem with a 3rd party piece of software. Doesn't matter which, really. The problem lies in a table called payments, with a column called txnumber...the newest version of this software fails a check during installation with the message "duplicate txnumber in payment table." Not sure how this could have happened, since there is no way to manually assign the txnumber, but the point is not important. What I'd like to do is figure out a sql script that will return only the duplicate number(s) so that I can either remove or change them manually. Unfortunately, I'm not terribly familiar with sql.
The duplicates that this thread relates to are the kind with duplicate "keyword" entries AND dissimilar field entries; i.e. :
keyword negative exact broad Phrase Polo 0 122 4 Polo 0 122 5
I've come up with an SQL query that seems to return all of these duplicates (save one of each type- the 'real', unique entry). However I think I made the query very inefficient. My SQL is very bad; this query will be running over tens of thousands of rows, so if it can be at all optimized I would greatly appreciate your help!
What I have so far is:
string query1 = "SELECT * FROM TableName" + " WHERE EXISTS (SELECT NULL FROM TableName" + " b" + " WHERE b.[keyword]= " + "TableName"+ ".[keyword]" + " AND b.[negative]<> " + "TableName"+ ".[negative]" + " ORb.[keyword]= " + "TableName"+ ".[keyword]" + " ANDb.[exact]<> " + "TableName"+ ".[exact]" + " ORb.[keyword] = " + "TableName"+ ".[keyword]" + " ANDb.[broad]<> " + "TableName"+ ".[broad]" + " ORb.[keyword]= " +"TableName"+ ".[keyword]" + " ANDb.[phrase]<> "+"TableName"+ ".[phrase]" + " GROUP BY b.[keyword], b.[broad], b.[exact]" + " HAVING Count(b.[keyword]) BETWEEN 2 AND 50000)" ;
the algoritm seems to check every column of every row in order to determine a duplicate. Seems straightforward to me, but alas slow...
Is there a better/faster way I can do this? Thanks for you help!
i tried the following query and able to get the list of foreign keys with column names as well as referred tables and referenced column
select parent_column_id as 'child Column',object_name(constraint_object_id)as 'FK Name',object_name(parent_object_id) as 'parent table',name,object_name(referenced_object_id)as 'referenced table',referenced_column_id from sys.foreign_key_columns inner join sys.columns on (parent_column_id = column_id and parent_object_id=object_id) Order by object_name(parent_object_id) asc
but i am not able to get the fks created more than once on same column refering to same pk
Yet another simple query that is eluding me. I need to find records in a table that have the same first name and last name. Because the table has a primaty key, these people were entered twice or they share the same first and last name.
How could you query this:
ID fname lname 10001 Bill Jones 10002 Joe Smith 10003 Sue Jenkins 10004 John Sanders 10005 Joe Smith 10006 Harrold Simpson 10007 Sue Jenkins 10008 Sam Worden
and get a result set of this:
ID fname lname 10002 Joe Smith 10005 Joe Smith 10003 Sue Jenkins 10007 Sue Jenkins