;WITH ctePreAgg AS ( select top 500 act_reference "ActivityRef", row_number() over (partition by act_reference order by act_reference) as rowno, t3.s_initials "Initials" from mytablestuff order by act_reference
[code]...
But what I would love to do next is take each of the above rows - and return the initials either in one column with all the nulls and duplicate values removed, separated by a comma ..
OR the above but using variable number of columns based on the maximum number of different initials for each row.this is not strictly required, but maybe neater for further work on the view
Hi All I have the dbo.OperatingHour It has many duplicates and I want to remove duplicates permanently The statement below works but when I open the table there are no changes
Insert into OperatingHour(Weekdays, Wednesdays, Fridays,Saturdays, [Sundays/Public Holidays]) (SELECT DISTINCT Weekdays, Wednesdays, Fridays,Saturdays, [Sundays/Public Holidays] FROM OperatingHour)
I have table with columns as ID, DupeID1, DupeID2. ID column is unique. DupeID1 and DupeID2 -- the combination should only be there once. I don't want reverse combination of duplicates, i.e. DupeID2, DupeID1 in the table. How can I delete the reverse duplicates from this table?
How do I only select rows with duplicate dates for each person (id)? (The actual table has approximately 13000 rows with approximately 3000 unique ids)
I have a table with 22 million Business records. I can see that there are duplicates when I group by BusinessName and Address and Phone. I'd like to place only the duplicates into a table, with a ranking, oldest business key gets a ranking of 1.
As a bonus I'd like each group to have a distinct group name (although not necessary, just want to know how to do this)
Later after I run more verifications to make sure these are not referenced elsewhere I'll delete everything with a matchRank > 1 out of the main Business table.
DROP TABLE [dbo].[TestBusiness]; GO CREATE TABLE [dbo].[TestBusiness]( [Business_pk] INT IDENTITY(1,1) NOT NULL, [BusinessName] VARCHAR (200) NOT NULL, [Address] VARCHAR(MAX) NOT NULL,
I have a table with one column, and i want to remove those records from the table which are duplicate i meant if i have a records rakesh in table two time then one records should be remove... my tables is like that
I need a SSIS Package that compares the two tables and removes the rows in the first table with keys that do not exist in the second table. For example....
I have a table of returns based on returnID. In another table I have returnErrors that are based on returnID as well. I want a package that will uses my returns table as a source and compares that dataset to the dataset of the returnError and remove or spilt the data so that my remaining dataset only has returns that have returnErrors. I can do this in T-SQL, but I am looking for a SSIS solution that uses the conditional split transformation or some other transformation(s) combinations.
Welcome,how can I alter following table in order to reduce neighbouringduplicates (symbol, position, quantity, price).Nr Symbol Position QuantityPrice Date1. wz9999b 1 1.02500.0 2007-05-09 08:09:42.6532. wz9999b 2 12.02500.0 2007-05-09 08:09:42.6533. wz9999b 1 100.02590.0 2007-05-10 15:47:04.1404. PZ0008VX 1 2280.8842090.55000000000022007-05-1612:43:12.4035. PZ0008VX 1 2280.8842102.05000000000022007-05-1612:45:27.4206. wz9999b 1 0.0012500.0 2007-05-18 09:47:16.0337. wz9999b 1 0.0012500.0 2007-05-18 09:47:53.2708. wz9999b 1 1.01.0 2007-05-22 12:35:07.8939. PZ0008VX 1 2280.8842102.05000000000022007-05-2409:38:26.16010. PZ0008VX 1 2280.8842102.05000000000022007-05-2409:38:38.80011. wz9999b 1 0.001 2500.02007-05-24 12:35:07.20712 wz9999b 1 0.002 2500.02007-05-24 12:35:14.98713. wz9999b 1 0.001 2500.02007-05-24 12:38:07.207In the result set I would like to get the rows number 6 and 10.Any suggestions??
I have a situation where we get XML files sent daily that need uploading into SQL Server tables, but the source system producing these files sometimes generates duplicate records in the file. The tricky part is, that the record isn't entirely duplicated. What I mean, is that if I look for duplicates by grouping the key columns, having count(*) > 1, I find which ones are duplicates, but when I inspect the data on these duplicates, the other details in the remaining columns may differ. So our rule is: pick the first record, toss the rest of the duplicates.
Because we don't sort on any columns during the import, the first record kept of the duplicates is arbitrary. Again, we can't tell at this point which of the duplicated records is more correct. Someday down the road, we will do this research.
Now, I need to know the most efficient way to accomplish this in SSIS. If it makes it easier, I could just discard all the duplicates, since the number of them is so small.
If the source were a relational table, I could use a SQL statement to filter the records to remove the duplicates, but since the source is an XML file, I don't know how to filter these out in the pipeline, since the file has to be aggregated to search for dups.
DELETE FROM tblContacts WHERE tblContacts.ID IN( SELECT F.ID FROM tblContacts AS F WHERE Exists ( SELECT email, Count(ID) FROM tblContacts WHERE tblContacts.email = F.email GROUP BY tblContacts.email HAVING Count(tblContacts.ID) > 1 ) ) AND tblContacts.ID NOT IN( SELECT Min(ID) FROM tblContacts AS F WHERE Exists ( SELECT email, Count(ID) FROM tblContacts WHERE tblContacts.email = F.email GROUP BY tblContacts.email HAVING Count(tblContacts.ID) > 1 ) GROUP BY email )
I readily admit that I've shamelessly copied 'n pasted this from a tutorial and then taken a stab at tweaking it for my own ends. But I really don't understand what it's doing.
Really, all I want to know is that it will remove records with duplicate email fields. But I could also do with confirming - looking at the "SELECT Min(ID)" bit - does that mean that if it finds a duplicate, it'll delete the latest-added one? And if so, that changing it to remove the earliest-added one is simply a case of changing MIN to MAX?
I am working with a bunch of records that have duplicates on the Persid and the intPercentID where there are duplicates I want to remove when I stick them in the temp table, I tried join on tempo table and doing not exists but still inserts, so now I am trying a merge but same thing. how can I keep duplicates from being inserted in the temp table. I made a cursor as well but its slow as heck, but it does work. trying better ways.
Create table #TempStr (STRId int not null Identity(1,1) primary key, Persid int, percentId int, dtCreated datetime, CreatedBy int)
INSERT #TempStr (Persid, percentId, dtCreated, CreatedBy) select intPersonnelID, intPercentID, dtSubmitted, intSubmittedBy from tblSTR whereintpercentId in (61,62) group by intPercentID, intPersonnelID, dtSubmitted, intSubmittedBy UNION ALL
Product No Grade Quantity A Good A Normal A Bad B Good B Bad C Good C Normal C Bad
In Table 2, Product No divided by Grade. I want to lookup the Quantity from Table 1 to Table 2. The same Product No will have 1 value, the other value is 0. The result for Column Quantity should be like this:
Table 2:
Product No Grade Quantity A Good 1 A Normal 0 A Bad 0 B Good 2 B Bad 0 C Good 3 C Normal 0 C Bad 0
I have a results table that was created from many different sources in SSIS. I have done calculations and created derived columns in it. I am trying to figure out if there is a way to remove duplicate rows from this table without first writing it to a temp sql table and then parsing through it to remove them.
each row has a like key in a column - I would like to remove like rows keeping specific columns in the resulting row based on the data in this key field.
Within the LinkingID, there are duplicates in ID1 and ID2 but just in opposite columns. I have been trying to figure out a way to remove these set based. It doesn't matter which duplicate is removed. Essentially these are just endpoints and I don't care which side they are on. The solution must recognize the duplicates and not just remove based on every 2nd row.
I have a bunch of contacts that I've scored how well their names match to other contacts in the same business. I can programmatically figure out how to parse the results, but would like to know how to do this via SQL. My problem is for Business_fk 968976 I have 7 contacts. In the end I should have 4 contacts based on name match. For the business key listed Gerardo Lopez is in the ContactScore table twice for Contact keys 7355719 and 57028145. I then have two rows like so:
Each reference each other, and 2 is a good case, a more difficult case would have key 1 listed 10 times showing a ContactMatch_fk of 2 - 11, and then Contact_fk 2 listed 10 times with a ContactMatch_fk of 1, 3-11.I know 57028145 maps to 7355719 from the first row in the ContactScore table, so when Contact_fk of 7355719 comes up I should be able to skip it and not process that match. Hopefully that makes sense. Anyway here is the test data:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ContactScore]') AND type in (N'U')) DROP TABLE [dbo].[ContactScore]; GO CREATE TABLE [dbo].[ContactScore] ( [ContactScore_pk]INT NOT NULL, [Contact_fk]INT NOT NULL,
How can i perform this task with ssis OR TRANSACT SQL? I HAVE THESE ROWS WITH THE NEXT DATA, I want to take just the valid one, BUT I HAVE A LOT OF COMBINATIONS AS following names, it can be animals, things or personal names
GABRIEL OBANDO --CORRECT GABRIEL OVANDO Gavriel OVANDO gAbriel OBANDO GABRIE OBANDO Gabri OBONDA MANAGUA --CORRECT NANAGUA NAMAGUA
Im working through the MS example of "removeDuplicates". I cant seem to figure out how to add custom property for input column.
I added the helper method: private static void AddIsKeyCustomPropertyToInput(IDTSInput90 input, object value) { IDTSCustomProperty90 isKey = input.CustomPropertyCollection.New(); isKey.Name = "IsKey"; isKey.Value = value; } I call it from: public override void ProvideComponentProperties() { //... AddIsKeyCustomPropertyToInput(input, false); //... } public override void ReinitializeMetaData() { IDTSInput90 input = ComponentMetaData.InputCollection[0]; if (input.CustomPropertyCollection.Count == 0) { AddIsKeyCustomPropertyToInput(input, false); } // ... }
However when I deployed it and added the component to SSIS package - I cant see the Custom Column "IsKey" in the input column properties window. What am I missing - please help
We all were new at one point.... any help is appreciated.
Objective:
Combining two 49,000 row tables and remove records where there is only 1 column difference. (keeping the specified column value removing the one with a blank.)
Reason:
I have 2 people going through a list, coding a specific column with a single letter value. They both have different progress on each sheet. Hence I am trying to UNION them and have a result of their combined efforts without duplicates.
My progress/where I'm stuck:
Here is my first query/union:
SELECT * FROM [Eds table] UNION SELECT * FROM [Vickis table];
As shown above, I have unioned these 2 tables and my results removed th obvious whole record duplicates, but since 1 column is different on these, a union without criteria considers them unique.....
an example of duplicates that I must remove are as follows:
I had Excel file input & import to DB Table by using Data flow in SSIS.but it had duplicates so I dont use the Dupe Records
So I planned like below:
Method 1: Here OLEDB Destination are Good Records(Without Duplicates) OLEDB Destination are Not Good Records(only Duplicates) or Method :2 If I add a column(GOOD_RECORD) in DB Table and Should I update '1' for top 1 record (for Good Record) and remaining as '0' for other Records (for Dups)latter I utilize Through flag of GOOD_RECORD
i.e.,, select * from DB_TABLE where GOOD_RECORD='1' .
I think that Method :2 Advisable for Performance/flexible but Here How can I update by using SSIS(Data flow) ????
I have some duplicate values for my query results, about 200 duplicates out of 30000 rows. Of these 200 duplicates I want to keep the ones that have a higher value for... 'UpdatedBatchID'.
SELECT IR.Id as 'ID' , CAST(IR.Priority as varchar) as 'Priority' , IRSupportGroupDN.DisplayName as 'Support Group' , DATEADD(MI,DATEDIFF(mi,GETUTCDATE(),GETDATE()),IR.CreatedDate) as 'Created Date' , DATEADD(MI,DATEDIFF(mi,GETUTCDATE(),GETDATE()),IR.ResolvedDate) as 'Resolved Date' , SLOConfig.DisplayName as 'SLO' , DATEADD(MI,DATEDIFF(mi,GETUTCDATE(),GETDATE()),SLOFact.TargetEndDate) as 'SLO Target' , SLOStatusDN.DisplayName as 'SLO Status' , SLOMetric.DisplayName as 'SLO Metric' , SLOFact.UpdatedBatchId as 'UpdatedBatchID'
What happens when you add the Ignore Case flag into the mix?
I'm having a hell of a time - I'm dealing with an SCD situation using TableDifference component and I have both existing dimensions and new data coming in, each go through identical Case-Insensitive/Sort with remove duplicates, but I'm getting identical new and deleted records detected - I think because of ordering issues. I'm still trying to whittle the test case down, but I think data from all around the records I'm investigating seems to get sorted in between them, so I'm having trouble getting a small test case built.
I think the mixed case data is the root of the problem, and I think the design is bad, but before I go back to the technical lead, I need to understand enough to show that you cannot take two pipelines sorted and de-duped case-insensitively and then do a case-sensitive table difference operation.
I have a table employee_test having the sample data. The rows with EmployeeID=6 are duplicate rows. I want to delete the duplicates retaining one row for the employeeid=6. Note :- I don't want to use a temporary table. I want to do this using a single query or at the most in a SP query batch. Please advise.
I have a report that is built using the report builder and the report model that was created. With the fields that are displayed in the report, it could be possible for the database to have multiple rows with the same values for all the fields (that are displayed in the report). The PK will be different, but this is not displayed in the report as the PK won't make sense in the report. So what happens is that the report displays just 1 record even though the database has multiple records because all the fields (that are displayed in the report) are identical. Is there a way to make the report display all the rows irrespective of duplicates?
I have an existing stored table with duplicate rows that I want to delete.Using a cte gives me
WITH CTE AS ( SELECT rn = ROW_NUMBER() OVER( PARTITION BY employeeid, dateofincident, typeid, description ORDER BY Id ASC), * FROM dbo.TableName ) DELETE FROM cte WHERE rn > 1
This is what I want to do basically. But this is only deleting in my CTE, is there anyway I can update my existing table "TableName" with this, without using temp tables?
My input table consist of ONLY TWO columns. And this is my table (INPUT TABLE) Time Action 17:42 SELL 17:43 BUY 17:44 SELL 17:45 SELL 17:46 SELL 17:47 BUY 17:48 BUY 17:49 SELL 17:50 SELL
When ever ACTION columns data is repeating remove that repetitive data rows.
(i mean to say Bold characters rows want to remove)
Hi everyone.How can I get the unique row from a table which contains multiple rowsthat have exactly the same values.example:create table test (c1 as smallint,c2 as smallint,c3 as smallint )insert into test values (1,2,3)insert into test values (1,2,3)i want to remove whichever of the rows but I want to retain a singlerow.TIADiego
I've got the following table data:116525.99116520.14129965.03129960.12129967.00And I need to write a query to return only rows 2 and 4, since theremaining rows have duplicate IDs. I've tried the Group By, but amhaving no luck.Thanks!
Ok, so I've been racking my brains on this one for a while now and figured it was time to ask for some help...
I have a table named tblAppInfo that looks something like this:
Code Block user_name app_name app_ver
User1 MS Word 2000 User2 MS Word 2000 User3 MS Excel 2000 User4 MS Excel 2000 User5 MS Excel 2000 User6 MS Excel 2000 User7 MS Outlook 2000 User8 MS PowerPoint 2000 User9 Adobe Acrobat 5.0 User10 Adobe Acrobat 6.0 User11 Adobe Acrobat 7.0
What I want to do is remove any rows that repeat the app_name and app_ver more than twice. Basically in the example above, it would remove the lines that have MS Excel 2000 (as it's listed 4 times). MS Word 2000 would stay as it's only listed twice. As would Outlook and PowerPoint. All 3 Adobe's would also stay as their versions are different.
I've tried the following but it still repeats applications that occur more than twice:
Code BlockSELECT user_name, app_name, app_ver FROM tblAppInfo GROUP BY app_name, app_ver, user_name having count(distinct app_name + app_ver + tblAppInfo.host_name) < 3 ORDER BY app_name, app_ver
I've also tried using nested select statements and everything else in my arsenal to no avail. I hope someone here will be able to help me out as I'm afraid I'll go bald if I try anymore!