Delete Rows With Duplicate Column Data But Unique Row Data
May 25, 2000
Hello,
This probably has been addressed before but I was unable to get the search to work properly on this site.
I am needing a script/way of deleting all rows from a DB with the exception of one record left for each row that has duplicate column data. Example :
Row 1
Field1 = 12345 Field2 =xxxxx Field 3=yyyyy Field4=zzzzz etc.
Row 2
Field1 = 12345 Field2 =zzzzzz Field 3=xxxxxx Field4=yyyyyy etc.
Row3
Field1 = 12345 Field2 =20202 Field 3=11111 Field4=zzzzz etc.
Row 4
Field1 = 54321 Field2 =xxxxx Field 3=yyyyy Field4=zzzzz etc.
Etc. Etc.
I want to be able to find the duplicates for Field1 and then delete all but 1 of those rows.( I don't care which one I keep just so only one is left.) The data in the other fields may or may not be unique.
I know how to find the duplicates it's just the deleting part I am having problems with. Any help would be much appreciated. Thanks,
I have a results table that was created from many different sources in SSIS. I have done calculations and created derived columns in it. I am trying to figure out if there is a way to remove duplicate rows from this table without first writing it to a temp sql table and then parsing through it to remove them.
each row has a like key in a column - I would like to remove like rows keeping specific columns in the resulting row based on the data in this key field.
I just converted an old non-relational database into something that MS SQL likes. The old primary keys were broken up into two columns, one being useful. The column I need to use has some rows with the same values in them.
I am looking for some way in a SQL script to look for the duplicate rows and add "_X" to the data where X is a value incremented by 1 for each duplicate row found.
For example, 3 duplicate rows with "5443aa" would return "5443aa", "5443aa_1","5443aa_2".
I'm designing an app for stock keeping. In my DB, I have a field called "ItemSerialNo" and I made it unique(but not the table's primary key). On my front end, I have a text box for Item Serial No and a combo box loaded with the item brands and also a save button. I know that if I try to save a serial no already existing in my DB, the app won't allow me because of the unique property of the field named "ItemSerialNo". But I want to be able to save a serial no already existing in my DB but with a different brand name.
For example, I want to be able to save information like:
1. ITEM SERIAL NO = 12345 BRAND NAME = AA 2. ITEM SERIAL NO = 12345 BRAND NAME = BB.
I have a large table that consists of the columns zip, state, city, county. The primary key "zip" has duplicates but the rows are unique. How do I filter out only the duplicate zips. Randy Garland
How would I get the unique email addresses and its associated row of data from a SQL Server table that has no unique fields defined? If there is a duplicate email address then only show the first one and not the other rows with the same email address. Example table and data UserID LastName Email997249 MCCO-49 S.MCCO-49@SampleISD.org997462 BATE-62 A.BATE-62@SampleISD.org997605 DENS-05 B.DENS-05@SampleISD.org 997622 KAIS-22 A.KAIS-22@SampleISD.org997623 KAIS-22 A.KAIS-22@SampleISD.org997624 KAIS-22 A.KAIS-22@SampleISD.org997625 KAIS-22 A.ZKAIS-22@SampleISD.org997626 KAIS-22 AX.ZKAIS-22@SampleISD.org997627 KAIS-22 AX.KAIS-22@SampleISD.org Result UserID LastName Email997249 MCCO-49 S.MCCO-49@SampleISD.org997462 BATE-62 A.BATE-62@SampleISD.org997605 DENS-05 B.DENS-05@SampleISD.org 997622 KAIS-22 A.KAIS-22@SampleISD.org997625 KAIS-22 A.ZKAIS-22@SampleISD.org997626 KAIS-22 AX.KAIS-22@SampleISD.org Thanks
My application is to capture employee locations.Whenever an employee arrives at a location (whether it is arriving forwork, or at one of the company's other sites) they scan the barcode ontheir employee badge. This writes a record to the tblTSCollected table(DDL and dummy data below).The application needs to be able to display to staff in a control roomthe CURRENT location of each employee.[color=blue]>From the data I've provided, this would be:[/color]EMPLOYEE ID LOCATION CODE963 VB002964 VB003966 VB003968 VB004977 VB001982 VB001Note that, for example, Employee 963 had formerly been at VB001 but wasmore recently logged in at VB002, so therefore the application is notconcerned with the earlier record.What would also be particularly useful would be the NUMBER of staff ateach location - viz.LOCATION CODE NUM STAFFVB001 2VB002 1VB003 2VB004 1Can anyone help?Many thanks in advanceEdwardNOTES ON DDL:THE BARCODE IS CAPTURED BECAUSE THE COMPANY MAY RE-USE BARCODE NUMBERS(WHICH IS DERIVED FROM THE EMPLOYEE PIN), SO THEREFORE THE BARCODECANNOT BE RELIED UPON TO BE UNIQUE.THE COLUMN fldRuleAppliedID IS NULL BECAUSE THAT PARTICULAR ROW HAS NOTBEEN PROCESSED. THERE ARE BUSINESS RULES CONCERNING EMPLOYEE HOURSWHICH OPERATE ON THIS DATA. ONCE A ROW HAS BEEN PROCESSED FORUPLOADING TO THE PAYROLL APPLICATION, THE fldRuleAppliedID COLUMN WILLCONTAIN A VALUE. IN THE PRODUCTION SYSTEM, THEREFORE, ANY SQL ASREQUESTED ABOVE WILL CONTAIN IN ITS WHERE CLAUSE (fldRuleAppliedID IsNULL)if exists (select * from dbo.sysobjects where id =object_id(N'[dbo].[tblTSCollected]') and OBJECTPROPERTY(id,N'IsUserTable') = 1)drop table [dbo].[tblTSCollected]GOCREATE TABLE [dbo].[tblTSCollected] ([fldCollectedID] [int] IDENTITY (1, 1) NOT NULL ,[fldEmployeeID] [int] NULL ,[fldLocationCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_ASNULL ,[fldTimeStamp] [datetime] NULL ,[fldRuleAppliedID] [int] NULL ,[fldBarCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL) ON [PRIMARY]GOINSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (963, 'VB001', '2005-10-18 11:59:27.383', 45480)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (963, 'VB002', '2005-10-18 12:06:17.833', 45480)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (964, 'VB001', '2005-10-18 12:56:20.690', 45481)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (964, 'VB002', '2005-10-18 15:30:35.117', 45481)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (964, 'VB003', '2005-10-18 16:05:05.880', 45481)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (966, 'VB001', '2005-10-18 11:52:28.307', 97678)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (966, 'VB002', '2005-10-18 13:59:34.807', 97678)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (966, 'VB001', '2005-10-18 14:04:55.820', 97678)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (966, 'VB003', '2005-10-18 16:10:01.943', 97678)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (968, 'VB001', '2005-10-18 11:59:34.307', 98374)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (968, 'VB002', '2005-10-18 12:04:56.037', 98374)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (968, 'VB004', '2005-10-18 12:10:02.723', 98374)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (977, 'VB001', '2005-10-18 12:05:06.630', 96879)INSERT INTO dbo.tblTSCollected(fldEmployeeID,fldLocationCode,fldTimeStamp,fldBarCode)VALUES (982, 'VB001', '2005-10-18 12:06:13.787', 96697)
We write to a log file each time a job runs. We give each job a unique batchid. I want to compare the run times of each step/record between two batch ids: '20150101888' and '20150101777'. Column Mins in the number of minutes each step ran. I am having trouble comparing the rows that have generic process and stepname – Trans Switch in this example. A new process within a batchid starts with a 'XX', 'Load'.
So I want to compare CA's Trans to CA's Tran Switch and ER's Trans Switch to ER's, etc. There can be multiple Trans Switch per process. There should be the same number between each batch, but no guarantees that something might change. Also, Trans Switch is not the record right after the new process (CA, ER) in production.
I have just made a very simplified example.
/** Want to compare 20150101888 to 20150101777 and end up with this result set. Notice that the duplicate process/step within a process has the process (CA and ER in this example) and a sequential number added to it: 'CA Trans 1'. Need this to pull out the largest time differences.
Time difference, process, step, mins1, mins2, batchid1, batchid2 -6, CA, Load, 17, 23, 20150101888, 20150101777 0, CA Trans 1, Switch, 8, 8, 20150101888, 20150101777 -6, CA Trans 2, Switch, 9, 15, 20150101888, 20150101777 -4, ER, Load, 7, 11, 20150101888, 20150101777 -4, ER Trans 1, Switch, 7, 11, 20150101888, 20150101777
I have a table in which a non-primary key column has a unique index on it. If I am inserting a record into this table with a duplicate column value for the indexed column, then what will be the error number of the error in above scenario? OR How could I find this out?
I built a package to transform data from flatfile into temp table. Then execute a stored procedure to tranform from the temp table into the real table. Since the real table have primary keys so it goes failed when there're duplicate rows from temp table. I let the temp table has no primary key. If I had to check the temp table for duplicate rows it would be using cursor through the data and the package will get slow. Is there task in SSIS to identify the duplicate data and eliminate it without using cursor ? Thanks in advance,
CAN ANYBODY REPLY FOLLOWING QUESTIONS. I WANT TO DELETE DUPLICATE ROWS IN MY TABLE WITHOUT USING TRANSACTION TABLE. AND ONE MORE QUESTION HOW TO GET YESTERDAY DATE BY USING ISQL WINDOW.
declare @a1 table (id int not null identity(1,1), phone decimal(18,0), adress nvarchar(100)) insert @a1 (phone,adress)
[Code] ....
id phone address ----------- --------------------------------------- ---------------------------------------------------------------------------------------------------- 1 111 new york 2 111 new york 3 111 new york 4 222 maxico 5 222 mexico
select phone,count(phone) as say from @a1 group by phone having count(phone)>1
phone say --------------------------------------- ----------- 111 3 222 2
How can I remove duplicate phone. For example
after delete select*from @a1 select*from @a1
id phone address ----------- --------------------------------------- ---------------------------------------------------------------------------------------------------- 1 111 new york 4 222 maxico
I wrote this to show for example in my real table have 50000 rows and 1751 duplicate rows.
I am using SQL Server 2005 Express Edition in conjunction with asp.net 2.0
Is there any way to guarantee when INSERTing data, that all values in a given column (which is of a char data type) are unique? Or do I have to do this programatically at the asp layer?
I have a database table which needs to make the Index "ParentREF, UniqueName" unique - but this fails because duplicate keys are found. Thus I now need to cleanup these duplicate rows - but I cannot just delete the duplicates, because they might have rows in detail tables. This means that all duplicate rows needs an update on the "UniqueName" value - but not the first (valid) one!
I can find those rows by
SELECT OID, UniqueName, ParentREF, CreatedUTC, ModifiedUTC FROM dbo.CmsContent AS table0 WHERE EXISTS ( SELECT OID, UniqueName, ParentREF FROM dbo.CmsContent AS table1 WHERE table0.ParentREF = table1.ParentREF AND table0.UniqueName = table1.UniqueName AND table0.OID != table1.OID ) ORDER BY ParentREF, UniqueName, ModifiedUTC desc
...but I struggle to make the required SQL (SP?) to update the "invalid" rows. Note: the "valid" row is the one with the newest ModifiedUTC value - this row must kept unchanged!
ATM the preferred (cause easiest) way is to rename the invalid rows with UniqueName = OID because if I use any other name I risk to create another double entry.
I have a table employee_test having the sample data. The rows with EmployeeID=6 are duplicate rows. I want to delete the duplicates retaining one row for the employeeid=6. Note :- I don't want to use a temporary table. I want to do this using a single query or at the most in a SP query batch. Please advise.
I want to change duplicate data in my sql server column.
the data= 'Barack Obama' , what I want 'Barack Obama 23453'
I have the following query:
UPDATE klant SET naamvoornaam=naamvoornaam + CAST(klantnummer AS VARCHAR) WHERE naamvoornaam NOT IN ( SELECT(naamvoornaam) FROM klant GROUP BY naamvoornaam HAVING ( COUNT(naamvoornaam) = 1 ))
Message from messages: String or binary data would be truncated.
I think the max length exceeds the length of the destination column. I've tried:
Max(Len(naamvoornaam + CAST(klantnummer AS VARCHAR)))
message from messages: An aggregate may not appear in the set list of an UPDATE statement.
I have a master table and i need to import the rows into the parent and child table.
Master table name is Flatfile_Inventory Parent Table name is INVENTORY Child Tables name are INVENTORY_AMOUNT,INVENTORY_DETAILS,INVENTORY_VEHICLE, Error details will be goes to LOG_INVENTORY_ERROR
I have 4 duplicate rows in the Flatfile_Inventory which i have already inserted in the Parent and child table.
Again when i run the query using stored procedure,its tells that all the 4 rows are duplicate and will move to the Log_Inventory_Error.
I need is if i have the duplicate rows in the flatfile_Inventory when i start inserting into the parent and child table the already inserted row have the unique ID i must identify it and delete that row in the both parent and chlid table.And latest row must get inserted into the Parent and child table from Flatfile_Inventory.
I am sorry if this is the wrong forum for this question but I can not find better fit. I am using Visual Studio 2005 to write an asp.net web site which includes an SQL data base. In the databaseI have a table called 'ACCOUNTS' which has primary key column called 'ID' and a nvarchar column called 'Name'. I want the values in the 'Name' column to be unique in the table so that all accounts have unique names. I don't want to make the 'Name' column the primary key because I want people to be able to change the name of their account after it has been created. It seems to me there must be some way to add such a constraint to the table but I can find none. I have searched the documentation and seen loads of references to unique columns but I can't find any way to set a column to unique thru the IDE (or any other way). Can anyone help me? Thanks in advance for any help.
I am having issues trying to write a query that would provide me the unique GUID numbers associated with a distinct PID if the unique GUID's > 1. To summarize, I need a query that just shows which PID's have more than one unique GUID. A PID could have multiple GUID's that are the same, I'm looking for the PID's that have multiple GUID's that are different/unique.