Transact SQL :: Performing Bulk Delete With Joining A Staging Table
Jun 23, 2015
I have a large table with 100 Million records that has around 1 million duplicate records that need to be deleted.
I am running a script that creates a staging table called,DuplicateTable that collects all the duplicates and then I want to write a an effecient delete statement.
Is it possible to write something like:
delete from OrigTable O join DuplicateTable D
on O.Key = D.key
Or do I have to run a loop on the DuplicateTable and run a delete statement record by record ?
I'm looking at SSIS and SqlBulkCopy as a possible method to quickly insert and process large amounts of data, my current method uses the sp_xml_preparedocument and OPENXML to parse an xml document of the data I want to process and insert into the database, however I'm noticing the performance of SqlServer parsing the xml document isn't that good.
I found the C# SqlBulkCopy method (new in .NET 2.0) and I was thinking it would be faster to use that to load my data into a staging table and then use SSIS to extract the data from the staging table, process and transform it as necessary and finally load it into the final destination tables. I was able to create an Integration Services Project that selects the data from the staging table, does a bit of processing on one of the fields (by calling a stored procedure), and finally loads the processed data into the final table.
The problem with this is I need to clean out the rows that were extracted from the staging table and I'm not sure how I can accomplish this (and I can't just issue a "delete from staging_table" because there maybe new records in the staging table that were not processed), perhaps I can either delete each record as it is proccessed or somehow get the last proccessed identity id from the staging table and delete all records less than or equal to that id.
thanks in advance for any help you can provide, maybe there is an easy way to accomplish what I'm trying to do.
The requirement is: I should allow single row delete from a table but not bulk delete. An audit table should get updated if there is any single delete or single update. So I wrote the triggers as follows: for single and bulk delete
ALTER TRIGGER [dbo].[TRG_Delete_Bulk_tbl_attendance] ON [dbo].[tbl_attendance] AFTER DELETE AS
[code]...
When I try to run the website, the database error I am getting is:Transaction count after EXECUTE indicates that a COMMIT or ROLLBACK TRANSACTION statement is missing. Previous count = 0, current count = 1.
Along with some other rows with the same format. I need to join to this table using a RiskElementCode that I get from the Source system. The trick is that it can be at any level, but I don't know which level it is at. So what I have to do is somehow get the correct row from the lookup table based on the code from the source to get the correct level.
So for Example, If i receive the RiskElementCode of 'SSR', that is in column RiskElementCategoryCode_3 so I need the row that has 'NA' for anything after RiskElementCategoryCode_3 where RiskElementCategoryCode_3 = 'SSR'. If i get 'DFR' I need to get the row where RiskElementCategoryCode_4 = 'DFR' since there are no levels deeper than 4 i don't need to check anything else. If I get 'PRR', then I need the row where RiskElementCategoryCode = 'PRR' and code_2, code_3 and code_4 = 'NA'.
So besides getting the correct row based on the code, i need to get the correct row based on the level where the next levels are 'NA'. I should only get 1 row each time.
I run into a problem when asking to show a query of employee vacation days.
table 1: column1 is dates e.g. 2015-01-01 2015-01-02 2015-01-03 . . . 2015-12-31
table2: employeeID vacation_date Tom 2015-01-03 Tom 2015-01-04 David 2015-01-04 John 2015-01-08 Mary 2015-01-012
My query output need to be:
2015-01-01 2015-01-02 2015-1-03 Tom 2015-01-04 Tom 2015-01-04 David 2015-01-05 2015-01-06 2015-01-07 2015-01-08 John 2015-01-09 2015-01-10 2015-01-11 2015-11-12 Mary
... etc... all the way to 2015-12-31
when i use left outer join, i only record one employee per date.
I have a table (can't change the schema of it since it is part of an off the shelf app ) that has columns for individuals which I need to extract several pieces of information, essentially Phone, Email Address, etc. See U1 - U6
What is a better way to return this information rather than multiple joins?
I know people use ROW_NUMBER() function to do the pagination but my below two query is bit complex. Sohow to use pagination there ? I used ROW_NUMBER() OVER(ORDER BY IsNull(A.OEReference, B.OEReference) ASC) as Line in one but not sure am i right or wrong.
IF IsNull(@GroupID,'') = '' SELECT IsNull(PartGroupName, 'UnMapped') AS PartGroupName, CASE IsNull(PartGroupName, '') WHEN '' THEN '' ELSE IsNull(IsNull(K.GroupID, IsNull(C.PartGroupID,'')),'') END AS PartGroupID,
I am looking for an alternate logic for below-mentioned code where I am inserting into a table and having left join with the same table
insert TABLE1([ID],[Key],[Return]) select distinct a.[ID],cat1,cat2 from (select ID,[Key] Cat1 ,[Return] cat2 from @temp as temp) a left join TABLE1 oon a.ID= o.ID and a.Cat1 = o.[Key] and a.cat2 = o.[return] where [key] is null order by ID
I have more than 500 CSV files with a similar structure [Same column name and same data format]. I would like to load these files in a database table on the SQL Server 2014 database.
I need to delete records from a table (Table1) which has a foreign key column in a related table (Table2).
Table1 columns are: table1Id; Name. Table2 columns include Table2.table1Id which is the foreign key to Table1.
What is the syntax to delete records from Table1 using Table1.Name='some name' and remove any records in Table2 that have Table2.table1Id equal to Table1.table1Id?
I have around 100 XL Files in a folder ,i want to import all the files dynamically and load all the data in a single table in sql server 2008. Without using SSIS i want to query using openrowset.
I have a table containing 8 million records. I need to replace 2 million of these records with a scaled down query that goes something like: SELECT 1, ShareholderID, Assets1 FROM MyTable (Yields appx. 200,000 recods) SELECT 2, ShareholderID, Assets2 FROM MyTable (Yields appx. 200,000 recods) . . . SELECT 10, ShareholderID, Assets1 + Assest2 + Assets3 + ... + Assets9 FROM MyTable (Yields appx. 200,000 recods)
Updates and cursors just seem to be too slow.
So far I have done the following, but was wondering if anyone could think of a better way. SELECT 6 million records that don't need to be deleted into a #TempTable Use statements above to select into same #TempTable DROP and recreate Original Table SELECT 6 + 2 million records INTO original table.
This seems rather convoluted. Is there a better approach? Would it be worth while to dump data to a file and use bcp / Bulk Insert
we can easily load a file into db tables. However, my main concern here is the number of columns in the file. A text file TEXT_1400.txt has 1400 columns. I am unable to load data to my db table using BCP or BULK INSERT commands, as maximum of 1024 columns are allowed per table in SQL Server 2008.
We can still go ahead and create ‘Wide Table’ (a special table that holds up to 30,000 columns. The maximum size of a wide table row is 8,019 bytes.). But when operating on wide table, BCP/BULK INSERT commands still fail. After few hours of scratching my head over BCP and BULK INSERT, I observed that while inserting BCP/BULK INSERT commands are unable to identify SPARSE columns and skip these columns, which disturbs column mapping and results in data conversion and trancation errors.
Is there any proper way to load this kind of files into the db table?
I've a table named Master in which a column is referenced in other tables like Child1, Child2,.. ChildN. I've deleted a part of data( say Column Id values 1,2,3,4,5) from all the Child tables which pointed to Master table Id column.
Now, I want to delete the same Id values from Master table as there rows are not referred in any of the child tables. When I try to delete the Id values 1 thru 5 from Master table, it is scanning all child tables for the references and taking lot of time for the deletion.
Is there any way to specify to the system(in the query) to delete the Master table values without scanning the child tables..?
I just wanted to delete a record from a filetable by "delete from dbo.DocumentStore where stream_id = 'A322276D-AE65-E511-8266-005056C00008'".
Then i received error 1934:
Meldung 1934, Ebene 16, Status 1, Zeile 578 Fehler bei DELETE, da die folgenden SET-Optionen falsche Einstellungen aufweisen: 'ANSI_PADDING'. Überprüfen Sie, ob die SET-Optionen für die Verwendung mit indizierte Sichten und/oder Indizes für berechnete Spalten und/oder gefilterte Indizes und/oder Abfragebenachrichtigungen und/oder XML-Datentypmethoden und/oder Vorgänge für räumliche Indizes richtig sind.
Something that i made a mistake with the SET-option for 'ANSI_PADDING' with indexted views and/or calculated rows and/or filtered indeces ...
Hello, Maybe anyone have done that before? I have table where i store SOURCE_TABLE_NAME and DESTINATION_TABLE_NAME, there is about 120+ tables. i need make SSIS package which selects SOURCE_TABLE_NAME from source ole db, and loads it to DESTINATION_TABLE_NAME in destination ole db.
I made such SSIS package. set ole db source data access mode to table or view name variable. set ole db destination data access mode to table or view name variable. set to variables defoult values (names of existing tables) but when i loop table names is changed, it reports error, that can map columns, becouse in new tables is different columns.
What is the best way to transfer data from the staging table into the main table.
Example: Staging Table Name: TableA_satge (# of rows - millions) Main Table Name: TableA_main (# of rows - billions)
Note: Staging table may have some data same as the main table.
Currently I am doing: - Load data into staging table (TableA_stage) - Remove any duplication of rows from the staging table (TableA_stage) - Disable all indexes on main table (TableA_main) - Insert into main table (TableA_main) from staging table (TableA_stage) - Remove any duplication of rows from the main table using CTE (TableA_main) - Rebuild indexes on main_table (TableA_main)
The problem with the above method is that, it takes a lot of time and log file size grows very big.
I am trying to insert bulk data into main table from staging table in sql server 2012. If any error comes, this total activity is rollbacked. I don't want that to happen. I want to know the records where ever the problem persists, and the rest has to be inserted.
How to delete records from multiple tables if main table’s entity is deleted as constraints is applied on all..There is this main table called Organization or TblOrganization.and this organization have branches which are in Brach table called tblBranch and this branch have multiple applications let say tblApplication and these application are used by multiple users called tblUsers.What I want is: when I delete the Organization All branches, application and users related to it must be deleted also.How I can apply that on a button click in asp.net web forms..Right now this is my delete function which is very simple
Public void Delete(int? id){ var str=”DELETE FROM tblOrganization WHERE organizationId=”+ id ; } And My tables LOOK LIKE this CREATE TABLE tblOrganization ( OrganizationId int, OrganizationName varchar(255)
Hi, Is this anyway to finding updated/ deleted recored using anyother data flow transformation tasks without using sql task. Can find the new records using merge join task.
Is there better way to merge master table using staging table?
We need to Insert/Update a Fact Table from staging Table. currently we are using a SP which update Fact Table for Each region. this process is schedule, every 5 min job is run and Update fact table.but time of Insert and Update too long from staging to Fact, currently we are using merge statement for Insert and update.in my sp we are looping number how many region we need to update and at a time single Region we are updating using while loop in current SP.
Hi I have a production table in SQL Server 2005 that has approx 500,000 records---every 6 hours this table needs to be truncated and filled
The basic SSIS package uses a script compentant and imports the data into a staging table which has the same structure as the production table. I have a final Execute SQL Task that Truncates the production table and does a Insert into production-select * from stage table.
Takes around 30 seconds to run this last Execute SQL Task--problem is there is a risk that our webservice will query this table during the Execute SQL Task and return incorrect results.
Q1: In this last Execute SQL Task if I used a BEGIN TRANSACTION and COMMIT TRANSACTION; will this be any quicker ?
Q2: In this last Execute SQL Task- would it be better to use a RENAME TABLE technique in TSQL--any code examples ??
Q3:Is there any way in TSQL I can compare EVERY FIELD in two tables ie Stage and Production which have identical data structures and figure out a way to update only the records that changed? Is SQL Server Replication the best way to do this
I'm trying to import the contents of a CSV file into a staging table that I've created in SQL server 2005. To perform the import I have used the BCP utility with the use of a BCP format file.
The problem I'm having is that the data in some of the fields in the csv file are longer than the length of the corresponding field in my staging table. So when I try to import I get the following error:
[Microsoft][ODBC SQL Server Driver]String data, right truncation
And the record with the error does not import.
My question would be, is there a way of telling BCP to trim the data before importing into the staging table? Could I somehow specify in the BCP file to trim the data before importing or is there a switch that i can specify in the BCP command which tells BCP to trim data to the length of the destination column.
If it helps I'm using the command below to run BCP.
I have a flat file with columns from a geographical hierarchy such as:
Country Zone State County City Store Sub Store , etc.
The file also has data columns for months to the right of the above columns such as:
Jul Aug Sept ......... basically 25 of these columns for two years' data for one product and another set of 25 columns for another kind of product. A typical record in the file looks like:
Country Zone State County City Store Substore
USA Southeast FL Hillsborough Tampa walmart Fletcher
I need to upload this data into a staging table in SQL Server 2005 using SSIS, I created a table with the geographical hierarchy columns but am trying to figure out a way to load the monthly data. I can create 50 columns for the 50 months ( 25 months for each product) but that would be very crude.
Is there a better way of inserting data from this flat file into a destination table? I need all the data in the staging table in one upload.
I am extracting source data which is in txt fille to OLE DB destination. But data of each day I want to save in different staging table. For Eg; tblProduct20081206, tblProduct20081207. How can it be done. I have seen lots of posting and script when destination is Txt. I want to use same table for staging but want to create different table for each day with adding date extension.
I have been working on this without any success. Anybody out there to help?
1. There are new files uploaded in the FTP site everyday by other regions. 3 regions and 1 file each. That means 3 files at 3 different times. The file name will be that same day's date.csv. Example; today's file will be A_20071106.csv , B_20071106.csv and C_20071106.csv. Tomorrow's will be A_20071107.csv,B_20071107.csv and C_20071107.csv. How do I do this to run everyday and take the file only for that day from FTP straight into SQL Staging table?
2. Everyday, immediately after each file is uploaded, to indicate the FTP file is loaded successfully, there will be a 20071106.dummy and so on for all three files will show up on the same folder in FTP. I must make this package runable only if the .done file arrives for that days date and then execute the .csv file. Otherwise check after 30 minutes if the .dummy is there yet or not. Do this until the .dummy file comes. Then execute the package that is on that time. Then do the same for the other two for that particular day. The times are 4pm for A, 6pm for B and 8.30pm for C
3. Then, if there is a new row, it should INSERT, any changes (based on 5 columns), it should update and if there was a row yesterday and it is not there today, DELETE.
I have these two CTE queries, the first one is the existing one which runs fine and returns what we need, the second is a new CTE query which result I need to join in to the existing CTE, now that would be two CTE's can that be done at all?The second query is working on the same table as the first one so there are overlaps in names, and they are the same, only columns I need to "join" is the "seconds" and "AlarmSessions".
;with AlarmTree as ( select NodeID, ParentID, NodeLevel, NodeName, cast('' as varchar(max)) as N0, cast('' as varchar(max)) as N1, cast('' as varchar(max)) as N2, cast('' as varchar(max)) as N3,