Search Query - Analysis On Duplicate Records Based Off Of Several Match Keys
Jun 7, 2014
I'm trying to do some analysis on duplicate records based off of several match keys. I have a data set of approximately 30,000 people and the goal is to determine how many duplicate matches are in the system.
How would I write an SQL statement that looks for the following pieces of information. (I'm not using one person as an example; I need to do an analysis on the entire data set)
First name (exact match)
Last name (exact match)
Address line 1 (exact match)
Postal code/zip (exact match)
First Initial (exact match)
Last name (exact match)
DOB exact match
Postal code/zip (exact match)
Just like Unique/Distinct command, is these some way I could list just the duplicate records from a table . The field is numeric. Thanks a lot for you help.
writing the query for the following, I need to collapse the continuity. If the termdate for an ID is one day less than the effdate of the next id (for the same ID) i need to collapse the records. See below example .....how should i write the query which will give me the desired output. i.e., get min(effdate) and max(termdate) if termdate is one day less than the effdate of next record.
I have a requirement where i want to delete the records based on the Date column. I have table which contain the columns like machinename ,lasthardwarescandate
I want to delete the records based on the max(Lasthardwarescandate) i.e. latest one, column where the machine name is duplicate menace it repeats. So how would i remove the duplicate machine names based on the Lasthardwarescandate column(There are multiple entries for the Lasthardwarescandate so i want to fetch the latest date column).
Note: Duplication should be removed based on “Last Hardware Scan” date.
Only latest date should be considered from multiple records for the same system. "
I have duplicate records in table.I need to count duplicate records based upon Account number and count will be stored in a variable.i need to check whether count > 0 or not in stored procedure.I have used below query.It is not working.
SELECT @_Stat_Count= count(*),L1.AcctNo,L1.ReceivedFileID from Legacy L1,Legacy L2,ReceivedFiles where L1.ReceivedFileID = ReceivedFiles.ReceivedFileID and L1.AcctNo=L2.AcctNo group by L1.AcctNo,L1.ReceivedFileID having Count(*)> 0 IF (@_Stat_Count >0) BEGIN SELECT @Status = status_cd from status-table where status_id = 10 END
Hey,I am having some confusion about how to formulate this particularquery.I have 2 tables. Table A has 4 columns say a1,a2,a3,a4 with thecolumns a1,a2,a4 forming the primary key. Table B again has 3 columnswith b1,b2,b3,b4 and like before, b1,b2 and b4 form the primary key.All columns are of the same datatype in both tables. Now I want to getrows from table A which are not present in table B. Whats the best wayof doing this?Thanks--Posted using the http://www.dbforumz.com interface, at author's requestArticles individually checked for conformance to usenet standardsTopic URL: http://www.dbforumz.com/General-Dis...pict235166.htmlVisit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbforumz.com/eform.php?p=815725
Can anyone help me to write a query to show customers who have duplicate accounts with Email address, first name, and last name. this is the table structure is Customer table
customerid(PK) accountno fname lname
Records will be
like this
customerid accountno fname lastname 1 2 lori taylor 2 2 lori taylor 3 1 randy dave
How can I made a query to show only my duplicate records ? For some reason that i do not know, i have duplicate entries in my clustered index 21 duplicate records in a table how can i query to know those 21 duplicate records ?
Hello, I have the following Query: 1 declare @StartDate char(8)2 declare @EndDate char(8)3 set @StartDate = '20070601'4 set @EndDate = '20070630'5 SELECT Initials, [Position], DATEDIFF(mi,[TimeOn],[TimeOff]) AS ProTime6 FROM LogTable WHERE 7 [TimeOn] BETWEEN @StartDate AND @EndDate AND8 [TimeOff] BETWEEN @StartDate AND @EndDate9 ORDER BY [Position],[Initials] ASC The query returns the following data: Position Initials ProTime -------------------------------------------------- -------- ----------- ACAD JJ 127 ACAD JJ 62 ACAD KK 230 ACAD KK 83 ACAD KK 127 ACAD TD 122 ACAD TJ 127
What I'm having trouble with is the fact that I need to return a results that has the totals for each set of initials for each position. For Example, the final output that I'm looking to get is the following: Postition Initials ProTime ACAD JJ 189ACAD KK 440ACAD TD 122ACAD TJ 127 Any assistance greatly appreciated.
How do I update a record that has duplicates. For example, I have 3612 orders some of these orders have multiple orderid's I want to update the record for each of these orders that was added most recently.
Query should only return less than 3000 records but its returning over 4M. It needs to show all duplicates records.... All the info are on the same table VENDFIl, so I used a self join but it seems to be looping..
SELECT A.FEDTID, B.VENDOR, C.NPI_NUMBER FROM VENDFIL A, VENDFIL B, VENDFIL C GROUP BY A.FEDTID, B.VENDOR
I have a SQL query I need to design to select name and email addressesfor policies that are due and not renewed in a given time period. Theproblem is, the database keeps the information for every renewal inthe history of the policyholder.The information is in 2 tables, policy and customer, which share thecustid data. The polno changes with every renewal Renewals in 2004would be D, 2005 S, and 2006 L. polexpdates for a given customer couldbe 2007-03-21, 2006-03-21, 2005-03-21, and 2004-09-21, with polno of1234 (original policy), 1234D (renewal in 2004), 1234S (renewal in2005), and 1235L (renewed in 2006).The policy is identified in trantype as either 'rwl' for renewal, or'nbs' for new business.The policies would have poleffdates of 2004-03-21 (original 6 monthpolicy) 2004-09-21 (first 6 month renewal) , 2005-03-21 (2nd renewal,1 year), 2006-03-21(3rd renewal, 1 yr).I want ONLY THE LATEST information, and keep getting earlyinformation.My current query structure is:select c.lastname, c.email, p.polno, p.polexpdatefrom policy p, customer cwhere p.polid = c.polidand p.polexpdate between '2006-03-01 and 2006-03-31and p.polno like '1234%s'and p.trantype like 'rwl'and c.email is not nullunionselect c.lastname, c.email, p.polno, p.polexpdatefrom policy p, customer cwhere p.polid = c.polidand p.polexpdate between '2006-03-01 and 2006-03-31and p.polno like '1234%'and p.trantype like 'nbs'and c.email is not nullHow do I make this query give me ONLY the polno 123%, or 123%Sinformation, and not give me the information on policies that ALSOhave 123%L policies, and/ or renewal dates after 2006-03-31?Adding a 'and not polexpdate > 2006-03-31' does not work.I am working with SQL SERVER 2003. Was using SQL Server 7, but foundit was too restrictive, and I had a valid 2003 licence, so I upgraded,and still could not do it (after updating the syntax - things likeusing single quotes instead of double, etc)I keep getting those policies that were due in the stated range andHAVE been renewed as well as those which have not. I need to get onlythose which have NOT been renewed, and I cannot modify the database inany way.*** Free account sponsored by SecureIX.com ****** Encrypt your Internet usage with a free VPN account from http://www.SecureIX.com ***
having some issues trying to create a query in excel 2013. I can get the data I want from sql, but I get individual transactions and I want to sum them by plu number. here is my query, I tried using group by but every time I add the field, I get an error that some other field is invalid because it's not contained in an aggregate or group by clause. btw, I didn't name these fields.
SELECT RPT_ITM_D.F254 as [Date], RPT_ITM_D.F01 As [PLU], RPT_ITM_D.F64 As [Qty Sold], RPT_ITM_D.F65 As [SOLD], OBJ_TAB.F17 As [RCode], OBJ_TAB.F29 As [Description], PRICE_TAB.F30 As [EL Price], PRICE_TAB.F31 As [Qty] FROM STORESQL.dbo.OBJ_TAB OBJ_TAB, STORESQL.dbo.PRICE_TAB PRICE_TAB, STORESQL.dbo.RPT_ITM_D RPT_ITM_D WHERE OBJ_TAB.F01 = RPT_ITM_D.F01 AND OBJ_TAB.F01 = PRICE_TAB.F01 AND PRICE_TAB.F01 = RPT_ITM_D.F01 AND ((RPT_ITM_D.F254>=? And RPT_ITM_D.F254<=? )AND (OBJ_TAB.F17=25))
My SQL is very basic. How do I create a query that will accept a parameter, an integer, and based on the integer, locate all the matches in a db? SELECT COUNT(*) AS Expr1, tblArticle.ArticleIDFROM tblArticle INNER JOIN tblArticleCategory ON tblArticle.ArticleCatID = tblArticleCategory.ACategoryIDGROUP BY tblArticle.ArticleID This isn't setting up the query to request a parameter.What am I doing wrong here? I"m trying to get the total number of articles for a particular category ID.
I have a query to insert records into a table based on a request but the query can only read one record at a time. How do i change the query such that it is able to read multiple records. In the below query i was able to input only 1 request which is 149906.
Query
declare @num_of_times int declare @Count INT DECLARE @newrequestid varchar(50) DECLARE @Frequency VARCHAR(25), @RequestId INT, @x INT, @Max INT, @RptDesc INT SET @RequestId = 149906 SET @x = 1
Hi All, I`m using BCP to import ASCII data text into a table that already has many records. BCP failed because of `Duplicate primary key`. Now, is there any way using BCP to know precisely which record whose primary key caused that `violation of inserting duplicate key`. I already used the option -O to output error to a `error.log`, but it doesn`t help much, because that error log contains the same error message mentioned above without telling me exactly which record so that I can pull that `duplicate record` out of my import data file. TIA and you have a great day. David Nguyen.
Hi! I am new .. very grateful for some advice. How do I eliminate duplicate ref keys Or should I have cache mode PARTIAL?
[Data Conversion [4910]] Error: Data conversion failed while converting column "salesperson_id" (88) to column "Copy of salesperson_id" (4929). The conversion returned status value 6 and status text "Conversion failed because the data value overflowed the specified type.". [Lookup [3882]] Warning: The Lookup transformation encountered duplicate reference key values when caching reference data. The Lookup transformation found duplicate key values when caching metadata in PreExecute. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE.
Hello,There is a program which performs some scripted actions via ODBC on tablesin some database on mssql 2000. Sometimes that program tries to insertrecord with key that is already present in the database. The error comes upand the program stops.Is there any way to globally configure the database or the whole mssqlserver to ignore such attempts and let the script continue without any errorwhen the script tries to insert duplicate-key records?Thank you for any suggestions.Pawel Banys
SQL server allows to create as many as foreign key constraints on a same table for a same column.
Will this affect the design or performance in anyway ?
Naming the constraint would be a good way to avoid this.But in case if someone has already created, How do I remove the existing duplicate keys ?
====================== For Example , I have 2 tables Author and Book. I could execute the below query n times and create as many as foreign keys I want.
ALTER TABLE Books ADD FOREIGN KEY (AuthorID) REFERENCES Authors (AuthorID)
I need the start and end time of consecutive records of the same vehicle with 0 speed ordered by date_time. If there is more than one consecutive record with zero speed it needs to be grouped together.
I've been able to get this select query to work, but I'm not sure how to modify it to turn it into a DELETE query:
USE QSCTestENG select p.[testid], COUNT(c.[testid]) FROM [dbo].[tblTestHeader] p left outer join [dbo].[tblTestMeasurements] c ON p.[testid]=c.[testid] where p.[model] = 'XPPowerCLC125US12' group by p.[testid] having COUNT(c.[testid]) <>48;
How to search a string from the given values. i want to search a string "Session" from the given column of results.. it is separator by comma. i want only 2 results from the given value... if i'm writing as like keyword it will return 4 but i need only the exact match of string.. _______________________ The Result should be Session,Study Patterns, session, asp.net _______________________ But the Result is coming as Session study, usercontrol Session, study Technical Session, Asp.net Patterns, session, asp.net ________________________ anyone tell the solution
books catalog, education, best books Birthday, Party Gopi Session study, usercontrol Session, study Holiday Technical Session, Asp.net Patterns, session, asp.net day, party events for Lords, daily thing events manager events things meeting, administrator marriage project ,event, demo madurai ,event demo, event calendar rangoli, event Demo Project event project
I have a large table that consists of the columns zip, state, city, county. The primary key "zip" has duplicates but the rows are unique. How do I filter out only the duplicate zips. So in effect I only have one row per unique key. Randy Garland
if you just want a list of all rows with duplicate zipcodes then ...
SELECT * FROM TableName WHERE zip IN ( SELECT zip FROM TableName GROUP BY zip HAVING COUNT(*)>1 )
Duncan
Duncan, I tried this but it does not return one row per key. Randy Garland
I'm trying to merge two Access databases into one SQL server database. I have 3 tables that are all related with primary and foreign keys.
When I try to import my second set of 3 tables I get errors about the keys already existing in the database. Is there any way to force SQL server to assign new keys while preserving my existing relationships? Thanks!
i tried the following query and able to get the list of foreign keys with column names as well as referred tables and referenced column
select parent_column_id as 'child Column',object_name(constraint_object_id)as 'FK Name',object_name(parent_object_id) as 'parent table',name,object_name(referenced_object_id)as 'referenced table',referenced_column_id from sys.foreign_key_columns inner join sys.columns on (parent_column_id = column_id and parent_object_id=object_id) Order by object_name(parent_object_id) asc
but i am not able to get the fks created more than once on same column refering to same pk
Yet another simple query that is eluding me. I need to find records in a table that have the same first name and last name. Because the table has a primaty key, these people were entered twice or they share the same first and last name.
How could you query this:
ID fname lname 10001 Bill Jones 10002 Joe Smith 10003 Sue Jenkins 10004 John Sanders 10005 Joe Smith 10006 Harrold Simpson 10007 Sue Jenkins 10008 Sam Worden
and get a result set of this:
ID fname lname 10002 Joe Smith 10005 Joe Smith 10003 Sue Jenkins 10007 Sue Jenkins
I have one table that stores log messages generated by a web service. I have a second table where I want to store just the distinct messages from the first table. This second table has two columns one for the message and the second for the checksum of the message. The checksum column is the primary key for the table.
My query for populating the second table looks like: INSERT INTO TransactionMessages ( message, messageHash ) SELECT DISTINCT message, CHECKSUM( message ) FROM Log WHERE logDate BETWEEN '2008-03-26 00:00:00' AND '2008-03-26 23:59:59' AND NOT EXISTS ( SELECT * FROM TransactionMessages WHERE messageHash = CHECKSUM( Log.message ) )
I run this query once per day to insert the new messages from the day before. It fails when a day has two messages that have the same checksum. In this case I would like to ignore the second message and let the query proceed. I tried creating an instead of insert trigger that only inserted unique primary keys. The trigger looks like:
IF( NOT EXISTS( SELECT TM.messageHash FROM TransactionMessages TM, inserted I WHERE TM.messageHash = I.messageHash ) ) BEGIN INSERT INTO TransactionMessages ( messageHash, message ) SELECT messageHash, message FROM inserted END
That didn't work. I think the issue is that all the rows get committed to the table at the end of the whole query. That means the trigger cannot match the duplicate primary key because the initial row has not been inserted yet.
We have a SQL Server 6.5 table, with composite Primary Key, having the Duplicate Entry for the Key. I wonder how it got entered there? Now when we are trying to import this table to SQL2K, it's failing with Duplicate row error. Any Help?