Group / Union Statement - Pull Unique Records From A Large Table
Sep 22, 2014
I am trying to use SQL to pull unique records from a large table. The table consists of people with in and out dates. Some people have duplicate entries with the same IN and OUT dates, others have duplicate IN dates but sometimes are missing an OUT date, and some don’t have an IN date but have an OUT date.
What I need to do is pull a report of all Unique Names with Unique IN and OUT dates (and not pull duplicate IN and OUT dates based on the Name).
I have tried 2 statements:
#1:
SELECT DISTINCT tblTable1.Name, tblTable1.INDate
FROM tblTable1
WHERE (((tblTable1.Priority)="high") AND ((tblTable1.ReportDate)>#12/27/2013#))
GROUP BY tblTable1.Name, tblTable1.INDate
ORDER BY tblTable1.Name;
#2:
SELECT DISTINCT tblTable1.Name, tblTable1.INDate
FROM tblTable1
WHERE (((tblTable1.Priority)="high") AND ((tblTable1.ReportDate)>#12/27/2013#))
UNION SELECT DISTINCT tblTable1.Name, tblTable1.INDate
FROM tblTable1
WHERE (((tblTable1.Priority)="high") AND ((tblTable1.ReportDate)>#12/27/2013#));
Both of these work great… until I the OUT date. Once it starts to pull the outdate, it also pulls all those who have a duplicate IN date but the OUT date is missing.
Example:
NameINOUT
John Smith1/1/20141/2/2014
John Smith1/1/2014(blank)
I am very new to SQL and I am pretty sure I am missing something very simple… Is there a statement that can filter to ensure no duplicates appear on the query?
Item Color Quantity -------------------- -------------------- -------------------------- Table Blue 10 Table Red 20 Table Yellow 30 Chair Blue 40 Chair Red 50
I'm wondering if there is a group state like this: Select Item, ?Function(Color), Sum(Quantity) From Inventory Group by Item which returns this:
I'm trying to pull all records from one table and just a single record from another. I have this join, (see below). It works ok, but the problem is if a blog record doesn't have a corresponding image record it doesn't return. The end result should be the blog record and a single corresponding image record. But always a blog record.
I'm wondering if there is a single statement I can write to pull my data. Let's say my Order table has one field for userId and one field for supervisorId (among other fields) both of which are foreign keys into the Users table where their name, address, etc. are stored. What I'd like to do is to pull all the rows from Order and have a join that pulls the user name and supervisor name from the User table all in one go. Right now I pull all the Orders with just user name joined, and then go back over the objects to add the supervisor name as a separate query.
The reason I'd like to do this is to simplify the objects I'm passing to the GridView by doing a single fetch instead of multiples. I'm using SQL Server, .NET 2.0 and VS.NET 2005.
I have a query that brings back the data below. I need to divide the BudgetTotal by the Count. Then I need to go to the records that make up those €œgroups€? and enter a Budget value = BudgetTotal/Count.
How could I write this in a stored procedure or a SQL statement if possible?
Thanks.
Kevin
SELECT TOP 100 PERCENT dbo.ReportTable.ProjectNo, dbo.ReportTable.Category, dbo.ReportTable.Type, COUNT(dbo.ReportTable.ProjectNo) AS count, dbo.ReportTable.Budget, dbo.OracleDownloadBudget.Budget AS Expr1 FROM dbo.ReportTable INNER JOIN dbo.OracleDownloadBudget ON dbo.ReportTable.Category = dbo.OracleDownloadBudget.Category AND dbo.ReportTable.ProjectNo = dbo.OracleDownloadBudget.Project AND dbo.ReportTable.Type = dbo.OracleDownloadBudget.Type GROUP BY dbo.ReportTable.ProjectNo, dbo.ReportTable.ProjectName, dbo.ReportTable.Category, dbo.ReportTable.Type, dbo.ReportTable.Budget, dbo.OracleDownloadBudget.Budget HAVING (dbo.ReportTable.Budget < 1) ORDER BY dbo.ReportTable.ProjectNo
I have a stored procedure that appends data from a temp table to a destination table. The procedure is called from an aspx web page. The destination table has an index on certain fields so as to not allow duplicates. The issue I'm having is if the imported data contains some records that are unique and some that would be duplicate, the procedure stops and no records are appended. How can I have this procedure complete it's run, passing over the duplicates and appending the unique records? Since the data is in a temp table (which gets deleted after each append) should I run some sort of 'find duplicates' query, and delete the duplicates from the temp table first, then append to the destination table? Thanks in advance.SMc
I have a task to count distinct records in a big table with roughly 30M records, performance is an important factor. Query is to be written to calculate weekly stats, weekly record number could be as high as 1M.
The actual result is like: ID Policy 350235744Credit Cards 350235744PCI 350235744PCI Audit
So the final number for this particular Policy is 3
I can write the query like:
select count(distinct Incident_id) policy_name from Reporting_DailyDlpDetail Where (year(INSERT_DETECT_TS)=2015) and (month(INSERT_DETECT_TS) =6) and (day(INSERT_DETECT_TS) between 2 and 9) This returns 526254 and costs 11 seconds to complete
or a query like: Select distinct Incident_id, policy_name from Reporting_DailyDlpDetail Where (year(INSERT_DETECT_TS)=2015) and (month(INSERT_DETECT_TS) =6) and (day(INSERT_DETECT_TS) between 2 and 9) This returns 749687 and costs roughly 1 minute to complete.
Result is different from the two queries, I believe the later gives correct number. How can I count the distinct based on a combo?Considering the size of data, what is the best and most efficient way to run the stats calculation against over 30 different scenarios (different policies and alert types) and not timeout?
I have a a Group By query which is working fine aggregating records by city. Â Now I have a requirement to focus on one city and then group the other cities to 'Other'. Â Here is the query which works:
Select [City]= CASE WHEN [City] = 'St. Louis' THEN 'St. Louis' ELSE 'Other Missouri City' END, SUM([Cars]) AS 'Total Cars'Â From [Output-MarketAnalysis] Where [City] IN ('St. Louis','Kansas City','Columbia', 'Jefferson City','Joplin') AND [Status] = 'Active' Group by [City]
Here is the result:
St. Louis 1000 Kansas City 800 Columbia 700 Jefferson City 650 Joplin 300
When I add this Case When statement to roll up the city information it changes the name of the city to 'Other Missouri City' however it does not aggregate all Cities with the value 'Other Missouri City':
Select [City]= CASE WHEN [City] = 'St. Louis' THEN 'St. Louis' ELSE 'Other Missouri City' END, SUM([Cars]) AS 'Total Cars'Â From [Output-MarketAnalysis] Where [City] IN ('St. Louis','Kansas City','Columbia', 'Jefferson City','Joplin') AND [Status] = 'Active' Group by [City]
Here is the result:
St. Louis 1000 Other Missouri City 800 Other Missouri City 700 Other Missouri City 650 Other Missouri City 300
I'm new to MS SQL and VB. I have a table with one field JOB_NAME containing 20 records. Out of that field I want to retrieve 6 of the 20 records into a pulldown menu. They are all unique text names like so:
Anna Smith John Doe
etc. I did not see IDs listed for any of the names in the table when I looked.
There is no common denominator to the names that can be filtered in the SELECT statement, and the 6 that I want will need to be pulled out individually.
Is there a way to do this with a SELECT statement? I have not found much information about how to extract unique records out of a single field. Here's the statement I'm using which pulls all of them:
strSQL = "SELECT DISTINCT JOB_NAME AS Names FROM [WORKER_NAMES] WHERE JOB_NAME<>' ' ORDER BY JOB_NAME ASC"
This gives me the total list but I only want to bring back 6 of the 20 for the pulldown.
Is there a way to modify this statement to pull only the records that I want?
A friend reminded me of a problem we tried to solve a few years ago and were unsuccessful. Below is a copy of the email he sent me. We would very much appreciate any ideas from the community. Thanks!Lets start with a simple schema where you have 4 tables:
I have a query similar to the following. The intent of this query is to retrieve the top 6 records meeting the specified criteria (LOGTYPENAME = 'Process Status Start' OR LOGTYPENAME = 'Process Status End' ) based on most recent dates. Please keep in mind that I expect to return up to 6 records for each unique LogProcessName. This could be thousands of different LogProcessNames with up to 6 records for each.
1) The table I am executing against currently is very large in size and thus takes a long time to execute against. It would seem there must be a more efficient query to get the results I am looking for? 2) CTE doesn't work on SQL 2000. I need a query that does. 3) I cannot modify the database itself in the process.
;WITH cte AS ( SELECT [LogProcessName], [LogBody], [LogDate], [LogGUID], row_number() OVER(PARTITION BY [LogProcessName] ORDER BY [LogDate] DESC) AS RN FROM [LOGTABLE] WHERE [LogTypeGUID] IN ( SELECT LogTypeGUID FROM LOGTYPE WHERE LogTypeName = 'Process Status Start' OR LogTypeName = 'Process Status End' ) ) SELECT * FROM cte WHERE RN = 1 OR RN = 2 OR RN = 3 OR RN = 4 OR RN = 5 OR RN = 6 ORDER BY [LogProcessName] DESC, [LogDate] DESC
Does anybody else have any idea that would yield the results that I am looking for and take into account items 1-3 above?
Our system runs a SQL Server 2012 DB, it has a table (table_a) which has over 10M records. Our system have to receive data file from previous system daily which contains approximate 3M updated or new records for table_a. My job is to update table_a with the new data.
The initial solution is:
1 Create a table (table_b) which structur is as the same as table_a
2 Use BCP to import updated records into table_b
3 Remove outdated data in table_a: delete from table_a inner join table_b on table_a.key_fileds = table_b.key_fields
4 Append updated or new data into table_a: insert into table_a select * from table_b
As the test result, this solution is very inefficient. Step 3 costs several hours, e.g. How can I improve it?
Hello everyone,Small and (I think) very simple quesiton;-) which makes me creazy.Let's say I have two tables listed below:T1====IDX====134T2===============IDD fk_IDX===============A1A2A4B1B3B4C4D1D2D3D4I would like to select from table T2 all distinct records IDD whichhave all of fk_IDX containded in T1.The select statement should return in this case ONLY:B and Dbecasue:B has 1,3,4andD has 1,2,3,4 so it has this combination 1,3,4 contained in the T1also.I've tried to do that with group by, with having, in and it neverworks (I always became all records which one of them is in this T1table).Maybe some one from you did try something like that, and can give afast answer.I will be very greatfullGreatingsMateusz
I have 2 tables in a 1: n relation. How can i get a select statement that the field in the n-relation with outputs, separated by a semicolon; Example: One person have many Job Titles
Table1 (tblPerson) Table2 (tblTitles) 1, "John", "Miller", "Employee; Admin; Consultant" 2, "Joan", "Stevens", "Employee, Software Engineer, Consultant" and so on .... 1 in select statement:
SELECT SUM(PTR_QUANTITY) OVER (PARTITION BY PTR_SYMBOL ORDER BY PTR_DATE, PTR_SEQUENCE) AS 'ACUMULADO' FROM MPR_portfolio_transactions ORDER BY PTR_SYMBOL, PTR_DATE, PTR_SEQUENCE
This select statement generates one line per existing record. And what I would like to do next is to UPDATE the field 'PTR_ACUM' with the result of the 'ACUMULADO'
We had a divestiture within our company. Now what used to be contained in one database in now split into two databases. One showing all history and one being all current data as of 6/1/2008. Is there an easy way to Union or Join these? Right now I'm currently doing a simple UNION ALL, but can't group the two select statements:
SELECT Year,Location,QtySold FROM historydb
UNION ALL
SELECT Year,Location,QtySold FROM currentdb
Can't do a subset and group both of these selects. How would some of you pro's do this? Right now I can put this in a simple view and then create a SP off of this view that would do this grouping, but it seems like I should be able to do it all in one query. Thanks.
Hello,So my table contains say 100,000 records, and I need to group thecategories in fld1 by the highest count of subcategories. Say fld1contains categories A, B, C, D, E.All of these categories contain subcategories AA, AB, AC, AD,...AJ, BA,BB...BJ, CA, CB, CC...CJ, etc in fld2.I am counting how many subcategories are listed for each category. LikeA may contain 5 of AA, 7 of AB, 3 of AC, 11 of AD...1 for the rest and20 of AJ. B may contain 2 of BA, 11 of BB, 7 of BC, and 1 for the rest.I want to pick up the top 3 subcategory counts for each category. Wouldlook like this:Cat SubCat CountA AJ 20A AD 11A AB 7B BB 11B BC 7B BA 2So event though each category contains 10 subcategories, I only want tolist the top 3 categories with the highest counts as above. If I justdo a group by and sort I can get this:Cat SubCat CountA ... ...AAAAAA...B ... ...BBBBB...But I just want the top 3 of each category. The only way I can think ofto do this is to query each category individually and Select Top 3, andthen Union these guys into one query. The problem is that I have tohardcode each category in the Union query. There may be new categoristhat I miss. Is there a way to achieve what I want without using Union?Thanks,Rich*** Sent via Developersdex http://www.developersdex.com ***Don't just participate in USENET...get rewarded for it!
Table1 -------------------------------------------------------------------- A C1 C2 C3 C4 -------------------------------------------------------------------- x 0 0 3 2 x 0 1 0 2 x 0 0 2 1 y 1 5 2 0
Table2 -------------------------------------------------------------------- A C1 C2 C3 C4 -------------------------------------------------------------------- x 0 0 1 4 y 1 0 3 1 y 1 2 0 0 y 0 0 5 1
select * from( select A,C1,C2,C3,C4 from Table1 group by A union select A,C1,C2,C3,C4 from Table2 group by A )as t
Result: -------------------------------------------------------------------- A C1 C2 C3 C4 -------------------------------------------------------------------- x 0 1 5 5 y 1 5 2 0 x 0 0 1 4 y 2 2 8 2
But i need the result like i.e grouped by column 'A' -------------------------------------------------------------------- A C1 C2 C3 C4 -------------------------------------------------------------------- x 0 1 6 9 y 3 7 10 2
select * from( select A,C1,C2,C3,C4 from Table1 group by A union select A,C1,C2,C3,C4 from Table2 group by A )as t group by A
The above query gives the following error [Error Code: 8120, SQL State: S1000] Column 't.C1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Here is my sql code. I'm using a "union all" to merge Incidents and Service Requests into one table. This works fine when I don't use the "group by". When using "group by" to get the total number of tickets per "Area" it is giving me duplicates. So it is returning a distinct list of "Area" from both select statements.Â
SELECT IRAreaDN.DisplayNameas 'Area' ,Count(IR.Id) as 'Total Requests' --,IRSupportGroupDN.DisplayNameas 'Support Group' --, CAST(DATEADD(MI,DATEDIFF(mi,GETUTCDATE(),GETDATE()),IR.CreatedDate) AS DATE)as 'Created Date' --, CreatedByUser.UserName as 'Created By User ID'
I need a script to pull the above records ONLY if the date repeats more than one within the same SEQ number. The proper script would output as follows:
Table 1: ID  Name  Description 1  ABc    xyz 2  ABC    XYZ
Table 2: RoleID Â Role 1 Â Â Â Â Admin 2 Â Â Â Â QA
Table 3: ID Â RoleID Â Time 1 Â Â 1 Â Â Â Â 09:14 2 Â Â 1 Â Â Â Â 09:15 1 Â Â 2 Â Â Â Â 09:16
Now I want all the records which belongs to RoleID 1 but if same ID is belongs to RoleID 2 than i don't want that ID.From above tables ID 1 belongs to RoleID 1 and 2 so i don't want ID 1.
I am looking to pull all records for current & previous calendar year in one query. I know how to pull the current calendar year, but how would I pull current & previous?
SELECT row_number() over (ORDER by a.empid) as rec_num, empname FROM employee_a UNION SELECT row_number() over (ORDER by a.empid) as rec_num, empname FROM employee_b
the problem is the rec_num repeat for each statement like this :
rec_num empname
1 john 2 maggy 1 lee 2 mary 3 louis
How do i make the rec_num continue for the next statement after union.
This is not really a question, maybe a start of discussion if this could be a feature in coming versions...
Let's say I have a table like this: id text1 text2 int varchar varchar
1 abc ghi 2 abc ghi 3 def ghi
and I want to query using group by, I would write SELECT text1, text2 FROM table GROUP BY text1, text2
This is not too bad, but imagine I got several columns of text I need to query, then I need to put them in both places, SELECT and GROUP BY. If I do use functions like SUBSTRING or concatenation on the columns, I need to repeat these in the GROUP BY as well.
It would be nice if I could say "GROUP BY id", and since id is unique on the table it is obvious that all columns taken from this table are already grouped and can be queried without having to have them in group as well. For functions this is of course only possible if all parts only refer to columns from this table, or absolute entries, like LEFT(varchar ,3) or int+1 or such.
With this example, the benefit is not visible: if I added the id to the GROUP BY, I get every single row of course. GROUP on unique is useless... It does make sense if you query another table and join in this one as a lookup, and the id gets "multiple" this way.
As I said, just an idea to make the GROUP BY section more transparent and avoid repeating sections!
I have an application that has an existing query which returns org units (parent and child) from organization table with orderby on createddate + orgid combination.
Also I added another log table Organization_log with exact columns as Organization table and additional 'IS_DELETED' bool column.
WITH Org_TREE AS ( SELECT *, null as 'IS_DELETED', convert (varchar(4000), convert(varchar(30),CREATED_DT,126) + Org_Id) theorderby FROM Organization WHERE PARENT_Org_ID IS NULL and case_ID='43333'
[code]...
I need to modify the query:
1. To display the records both from the Organization table and Organization_Log table. 2. The orderby should be sorted on 'Organization Name' asc and it should follow the child order in alpha sort as well.