I'm currently looking into SSIS, to establish whether or not it can improve on an existing stored procedure.
We have a sp that performs standard ELT functions: it extracts new (or newly updated) data out of A, transforms it, and then loads it into B. It runs as part of a scheduled job and takes approx 60 seconds to complete. Fine. But we want it to go faster, and this is where (we hope) SSIS comes in...
I'm approaching this area of SQL Server 2005 for the first time, and have been looking towards the data flow task and its transformations to provide such an equivalent, faster solution. Before I continue down this road however, I would welcome some peer feedback/comment on whether SSIS - and its data flow transformations - are indeed the best tools to leverage when looking to accomplish such an ELT function, and quickly.
I guess the fundamental question here is: 'Why transforms over script?' I am reading Brian Knight's book, and I'd like to quote a few passages:
'...the nicest thing about transforms in SSIS is that it is all done in-memory and it no longer requires elaborate scripting as in SQL Server 2000 DTS...'
I guess this means that it doesn't need to be complied/interpreted, which I suppose all DML does?
'...one of the overriding themes of SSIS is that you shouldn't have to write any code to create your transformation...'
Is this because writing code is considered a more complex task than creating + configuring a transformation, or is it (at least in part) because a transformation is necessarily going to be quicker than its DML equivalent?
Hope some of you guys can respond with some interesting thoughts.
I created a slowly changing dimension object and used an OLE DB Source object with a SQL Command to feed it. After all the SCD objects are created, I get a warning about a truncation which might happen.
I adjust the source query with the right converts, so the query uses data types and lengths which match the target table. I update the OLE DB Source with the new query. Column names are unchanged.
Now how do I get this data flow to reflect the new column lengths? Do I have to throw away all objects and recreate the SCD? The refresh button in the SCD object doesn't do it. This is also a problem when adding columns to your dimension table. Because I modify the stuff that the SCD generates, it's VERY teadious to always have to throw it all away and start over again.
Hi, hope someone can help or point me down the right track.
I have to load 50+ tables (all have different file layouts) using a data reader (The source only allows an OBDC connection) into a SQL Server 2005 databases, rather than create 50+ ETL packages that will be identical in process terms is it possible to create a single package that will dynamically re-map the source destination joins. I know I can set the Source and Destination using expressions however how do I ensure that the mappings are updated and for the errors can I automate so that they re-direct? Is this possible using a script task? Or by some other means?
I need to call a stored procedure to insert data into a table in SQL Server from SSIS data flow task. I am currently trying to use OLe Db Destination, but I am not sure how to map inputs to OLE DB Destination to my stored procedure insert. Thanks
I want to insert data calling a stored procedure and call this from a Data Flow destination object. Is it possible?
I understand that Ole Db Command transformation object can call stored procedure, but that will not rollback in the event of error in the middle.
I understand that Ole Db Destination object will rollback in middle of import, but I don't see how to do the insert by calling stored procedure. "Sql Command" option in Ole Db Destination object does not seem to present solution to the problem.
Am I missing something here or is Ssis / Microsoft demanding that Insert stored procedure not be used when using Data Flow destination object to insert data into target table?
Hi. I am trying to extract the data returned from a store procedure to a flat file. However, it fail to execute this package in the OLE DB Source. I select the SQL Command in the Data Access Mode, then use:
USE [SecurityMaster] EXEC [dbo].[smf_ListEquity]
It runs ok in the Preview, but not in the Run. Then the system returns during executing the package:
Error: 0xC02092B4 at Load TickerList, OLE DB Source [510]: A rowset based on the SQL command was not returned by the OLE DB provider. Error: 0xC004701A at Load TickerList, DTS.Pipeline: component "OLE DB Source" (510) failed the pre-execute phase and returned error code 0xC02092B4.
Right, so I'm currently trudging through the SQL video tutorials and such, so it may be that I get to this sooner or later, but as I'm under a deadline, I thought I'd post this question beforehand so I can use that info with what I'm learning now. Here's my situation: I have a ASP.NET 2.0 site in which I currently use XML files to display the text on the page, and I transform that text using an XSL stylesheet. I want to move that data to a database, but I'm not sure what is the best way to do that. Basically what I'm most concerned with is storing the main text (paragraphs with embedded hyperlinks). Currently, I can get the XSL to pick out the links and transform them from simply XML data to live links when they display on the page, but would I be able to do the same if I were pulling these paragraphs out of a database? Or should I just store the XML data in the database, and still pull that out so I can transform it appropriately with the XSL sheet I already have? (For that matter, can I dynamically write XML content to a database? Or am I just better off keeping my XML files?) What's the best approach for something like this?Thanks for the help!
I have SSIS which will have OLF DB Source, and then have Derived Column component to managering all data from OLF DB Source. I used to have default columns such as Create Date, Update Date set as fixed date. Now we decided to put this default column values into a table to manage. I then have problem to choose which component I should use in order to have this columns selected from default table.
For example: if Create Date is null, I have to select default value from the default table; otherwise, use Create Date value and so on.
I am using SQL 2005 SSIS. I need to do a data conversion for a date field in a txt file. I used the import wizard to bring my txt file into SQL 2005 but didn't convert the date. The date is displayed in the flat file as 20070612. Can someone help me convert the date. I did add an OLE DB Source to the Data Flow screen and selected command what do I do next and what do I write?
Hello all.I am trying to write a query that "just" switches some data around soit is shown in a slightly different format. I am already able to dowhat I want in Oracle 8i, but I am having trouble making it work inSQL Server 2000. I am not a database newbie, but I can't seem tofigure this one out so I am turning to the newsgroup. I am thinkingthat some of the SQL Gurus out there have done this very thing athousand times before and the answer will be obvious to them.This message is pretty long but hopefully gives you enough informationto replicate the issue.There are 3 tables involved in my scenario. Potentially a lot more inthe real application, but I'm trying to keep this example as simple aspossible.In my database I have many "things". Let's call them "User Records"(table: users) for this example. My app allows the customer to createany number of custom "Extra Fields" (XF's) for a given User Record.The Extra Field definitions are stored in a table which we can callattribs. The actual XF values for a given user record are stored in athird table, let's call it users_attribs.users_attribs will look something like this (actual DDL below.)UserID | ExtraFieldID | Value--------------------------------------User_1 | XF_1 | hamUser_1 | XF_2 | eggsUser_2 | XF_1 | baconUser_2 | XF_2 | cheeseUser_3 | XF_2 | onionsThe end result is that I want a SQL query that returns something likethis:UserID | XF_1 | XF_2-------------------------------------User_1 | ham | eggsUser_2 | bacon | cheeseUser_3 | NULL | onionsPotentially there would be one column for each extra field definition.One interesting question is how to get a dynamic number of columns toshow up in results, (so new XF's show up automatically) but I'm notworried about that for now. Assume I will hard-code a specific set ofextra fields into my query.The key here is that all users must show up in the final result EVENIF they don't have some extra field value defined. Since User_3 inthe example above doesn't have an XF_1 record, we see a NULL in thatcolumn in the final result.With Oracle I am able to accomplish this via an Outer Join, and I knowSQL Server supports Outer Joins, but I can't seem to make it work. Inever version I have tried so far, if any user is missing any extrafield value, the entire row for the user goes "missing", and that ismy problem.It seems like one possible solution would be to just go ahead andpopulate the users_attribs table with a NULL value for thatcombination of user ID and extra field ID, basically adding a new rowlike this:UserID | ExtraFieldID | Value--------------------------------------User_3 | XF_1 | NULLI would like to avoid that if possible, for a number of reasons,particularly the question of *when* that NULL would be added. I don'twant my report to touch the database and add stuff at reporting timeif at all possible. In Oracle, I seemingly don't have to, and I wantto get to that point on SQL Server.So, here is some specific DDL to recreate this scenario:CREATE TABLE users (user_id varchar(60), username varchar(60));-- Extra Field (attribs) definitionsCREATE TABLE attribs (xf_id varchar(60), xf_name varchar(60));-- Extra Field values for UsersCREATE TABLE users_attribs (user_id varchar(60), xf_id varchar(60),val varchar(60));-- populate the sample tables-- sample User recsINSERT INTO users VALUES ('U_1', 'John Smith');INSERT INTO users VALUES ('U_2', 'Mary Rogers');-- sample extra field definitionsINSERT INTO attribs VALUES ('XF_1', 'Extra Field 1');INSERT INTO attribs VALUES ('XF_2', 'Extra Field 2');INSERT INTO attribs VALUES ('XF_3', 'Extra Field 3');-- sample values for User Extra Fields (XF's)-- U_1 ("John Smith") has complete values for each XFINSERT INTO users_attribs VALUES ('U_1', 'XF_1', 'XF_1 value forU_1');INSERT INTO users_attribs VALUES ('U_1', 'XF_2', 'XF_2 value forU_1');INSERT INTO users_attribs VALUES ('U_1', 'XF_3', 'XF_3 value forU_1');-- U_2 ("Mary Rogers") only has one value, missing the other two..INSERT INTO users_attribs VALUES ('U_2', 'XF_2', 'XF_2 value forU_2');Now, I can get what I want on Oracle, provided that I define an newview that joins the three tables together, then do a separate query onthat view that does an outer join. I could dispense with the view,but I don't want to hard-code the XF ID's into the query. I am finewith hardcoding the XF names, though. (Long story.)-- Create a User Extra Field view that joins Users-- extra field definitons (attribs)-- and values (users_attribs.)CREATE VIEW u_xf_view ASSELECT u.user_id, at.xf_name, uxf.valFROMusers u,attribs at,users_attribs uxfWHEREuxf.user_id = u.user_id ANDuxf.xf_id = at.xf_id-- Oracle-only outer join syntax works if you use the view:SELECTu.username as "User Name",uxf1.val as "Extra Field 1 Value",uxf2.val as "Extra Field 2 Value",uxf3.val as "Extra Field 3 Value"FROMusers t,u_xf_view uxf1,u_xf_view uxf2,u_xf_view uxf3WHEREuxf1.user_id(+) = t.user_id ANDuxf1.xf_name(+) = 'Extra Field 1' ANDuxf2.user_id(+) = t.user_id ANDuxf2.xf_name(+) = 'Extra Field 2' ANDuxf3.user_id(+) = t.user_id ANDuxf3.xf_name(+) = 'Extra Field 3';-- RESULTS (correct):User Name Extra Field 1 Value Extra Field 2 Value ExtraField 3 Value------------- ------------------------ ------------------------------------------------John Smith XF_1 value for U_1 XF_2 value for U_1 XF_3value for U_1Mary Rogers NULL XF_2 value for U_2 NULL2 Row(s)So far I have not been able to get the equivalent result in SQLServer. Like I said, I am really hoping to avoid populating thoseNULL values. Can anything think of a way to replicate Oracle'sbehavior here? I have tried a number of variations on the ANSI joinsyntax instead of Oracle's (+) operator, but everything I tried so farhas only yielded a row when ALL extra fields are populated (or evenworse behavior.)I greatly appreciate any assitance you may be able to give. I would behappy to provide any additional information if I forgot to mentionsomething important. I apologize in advance for any broken / wrappedlines. Thank you for taking the time to read this.I'm going to be out of town for the next week or so, so I won't checkfor a response until then, but as soon as I get back home I will checkback in the newsgroup.Thank you!!Preston Landerspibble (at) yahoo (dot) com
I need to loop the recordset returned from a ExecuteSQL task and transform each row using a Data Conversion task (or a Script Task).
I know how to loop the recordset returned by an ExecuteSQL task:
http://www.sqlis.com/59.aspx
I loop the returned recordset (which is mapped to a User variable of type System.Object) and assign the Variable Mappings in the ForEach Loop to different user variables which map to the Exec proc resultset (with names and data types).
I assume to now use these as the Available Input columns for the Data Conversion task, I drag a Data Flow task inside the For Each Loop container and double-click it, then add a Data Conversion task.
But the Input columns (which I entered in the Variable Mappings in the ForEach Loop containers) dont show up in the Available Input columns of the Data Conversion task.
How do I link the Variable Mappings in the ForEach Loop containers from the recordset returned by the Execute SQL Task to the Available Input columns of the Data Conversion task?
.......................
If this is not possible, and the advice is to use the OLEDB data flow as the input for the Data Conversion task (which is something I tried too), then the results from an OLEDB Command (using EXEC sp_myproc) are not mapped to the Available Input columns of the Data Conversion task either (as its not an explicit SQL Statement and the runtime results from a stored proc exection)
I would like to use the ExecuteSQL task to do this as the Package is clean and comprehensible. Which is the easiest best way to map the returned results from a Stored proc execution to the Available Input columns of any Data Flow transformation task for the transform operations I need to execute on each row of data?
[ Could not find any useful advice on this anywhere ]
I've read that SSIS tries to do all transformations in memory as a way of enhancing processing speed. What happens though if the amount of data processed exceeds the available RAM? Are raw files then used (similar to staging tables) or is an error generated?
I need to pass a parameter from control flow to data flow. The data flow will use this parameter to get data from a Oracle source.
I have an Execute SQL task in control flow to assign value to the Parameter, next step is a data flow which will need take a parameter in the SQL statement to query the Oracle source,
The SQL Looks like this:
select * from ccst_acctsys_account
where to_char(LAST_MODIFIED_DATE, 'YYYYMMDD') >?
THe problem is the OLE DB source Edit doesn€™t have anything for mapping parameter.
I have an Execute SQL Task that returns a Full Rowset from a SQL Server table and assigns it to a variable objRecs. I connect that to a foreach container with an ADO enumerator using objRecs variable and Rows in first table mode. I defined variables and mapped them to the columns.
I tested this by placing a Script task inside the foreach container and displaying the variables in a messagebox.
Now, for each row, I want to write a record to an MS Access table and then update a column back in the original SQL Server table where I retreived data in the Execute SQL task (i have the primary key). If I drop a Data Flow Task inside my foreach container, how do I pass the variables as input to an OLE DB Destination on the Data Flow?
Also, how would I update the original source table where source.id = objRects.id?
Thank you for your assistance. I have spent the day trying to figure this out (and thought it would be simple), but I am just not getting SSIS. Sorry if this has been covered.
Dear All! My package has a Data Flow Task. In Data Flow Task, I use a Script Component and a OLE BD Destination to transform data from txt file to database. Within Data Flow Task, I want to call File System Task to move file to a folder or any Task of "Control Flow" Tab. So, Does SSIS support this task? Please show me if any Thanks
I'm currently setting variables at the package level with an ExecuteSQL task. This works fine. However, I'm now starting to think about restartability midway through a package. It would be nice to have the variable(s) needed in a data flow set within the data flow so that I only have to restart that task.
Is there a way to do that using an SQL statement as the source of the value in a data flow?
OR, when using checkpoints will it save variable settings so that they are available when the package is restarted? This would make my issue a moot point.
Hi all! I recently started working with SSIS and one of the things that is puzzling me the most is what's the best way to go:
A small control flow, with large data flow tasks A control flow with more, but smaller, data flow tasksAny help will be greatly appreciated. Thanks, Ricardo
I am new to SSIS and have the following problem. I used the following script to clear data in columns of any CR/LF/Commas and char(0)'s. Can I just transfer this to SSIS and how exactly do I do that? Any help or advice would help.
Hi, I'm trying to implement an incremental data pull (Oracle to SQL) based on Andy's blog: http://sqlblog.com/blogs/andy_leonard/archive/2007/07/09/ssis-design-pattern-incremental-loads.aspx
My development machine is decent: 1.86 GHz, Intel core 2 CPU, 3 GB of RAM. However it seems the data flow task gets hung whenever I test the package against the ~6 million row source, as can be seen from these screenshots. I have no memory limitations on the lookup transformation. After the rows have been cached nothing happens. Memory for the dtsdebug process hovers around 1.8 GB and it uses 1-6 percent of CPU resources continuously. I am not using fast load to insert new records into my sql target table. (I am right clicking Sequence Container 3 and executing this container NOT the entire package in the screenshots)
The same package works fine against a similar test table with 150k rows. http://i248.photobucket.com/albums/gg168/boston_sql92/7.jpg http://i248.photobucket.com/albums/gg168/boston_sql92/8.jpg
The weird thing is it only takes 24 minutes for a full refresh of the entire source table from Oracle to the SQL target table. Any hints,advice would be appreciated.
I am wondering if it is possible to use SSIS to sample data set to training set and test set directly to my data mining models without saving them somewhere as occupying too much space? Really need guidance for that.
What I am looking to do is use a complicated stored procedure to getdata for me while in another stored procedure.Its like a view, but a view you can't pass parameters to.In essence I would like a sproc that would be like thisCreate Procedure NewSprocASSelect * from MAIN_SPROC 'a','b',.....WHERE .........Or Delcare Table @TEMP@Temp = MAIN_SPROC 'a','b',.....Any ideas how I could return rows of data from a sproc into anothersproc and then run a WHERE clause on that data?ThanksChris Auer
I am working on importing an Excel workbook, saved as multiple CSV flat files, that has both group level data and related detail row on the same sheet. I have been able to import the group data into a table. As part of the Data Flow task, I want to be able to save the key value for the group, which I will use when I insert the detail rows.
My Data Flow has the following components: The flat file with the data, which goes to a derived column transformation to strip out extraneous dashes, which leads to the OLEDB Destination component.
I want to save the value as a package level variable, so that I can reference it in another dataflow.
Is this possible, and if so, at what point do I save the value?
Hi all - I'm trying to optimized my stored procedures to be a bit easier to maintain, and am sure this is possible, not am very unclear on the syntax to doing this correctly. For example, I have a simple stored procedure that takes a string as a parameter, and returns its resolved index that corresponds to a record in my database. ie exec dbo.DeriveStatusID 'Created' returns an int value as 1 (performed by "SELECT statusID FROM statusList WHERE statusName= 'Created') but I also have a second stored procedure that needs to make reference to this procedure first, in order to resolve an id - ie: exec dbo.AddProduct_Insert 'widget1' which currently performs:SET @statusID = (SELECT statusID FROM statusList WHERE statusName='Created')INSERT INTO Products (productname, statusID) VALUES (''widget1', @statusID) I want to simply the insert to perform (in one sproc): SET @statusID = EXEC deriveStatusID ('Created')INSERT INTO Products (productname, statusID) VALUES (''widget1', @statusID) This works fine if I call this stored procedure in code first, then pass it to the second stored procedure, but NOT if it is reference in the second stored procedure directly (I end up with an empty value for @statusID in this example). My actual "Insert" stored procedures are far more complicated, but I am working towards lightening the business logic in my application ( it shouldn't have to pre-vet the data prior to executing a valid insert). Hopefully this makes some sense - it doesn't seem right to me that this is impossible, and am fairly sure I'm just missing some simple syntax - can anyone assist?
I have written the following stored procedure to export a csv file. I am wanting to put a If statement in here so if view UDEF_DISPATCHER_INTERFACE_ORDER_HEADER_VIEW returns nothing then the procedure does not run.
INSERT INTO UDEF_DISPATCHER_INTERFACE_ORDER_HEADER_TABLE_TEMP SELECT * FROM UDEF_DISPATCHER_INTERFACE_ORDER_HEADER_VIEW
INSERT INTO UDEF_DISPATCHER_INTERFACE_ORDER_LINE_TABLE_TEMP SELECT * FROM UDEF_DISPATCHER_INTERFACE_ORDER_LINE_VIEW
Hi,I've got some XML which exists as a text variable in a temp table in SQL Server 2000.I need to pass this XML into sp_xml_preparedocument so I can rebuild a table out of it. But I can't figure out the syntax.If I try doing this:declare @idoc intexec sp_xml_preparedocument @idoc output, (select XmlResult from #cache)I get an error, with or without the brackets round the select statement.The temp table is created using an SP, but I can't call that directly either. This:declare @idoc intexec sp_xml_preparedocument @idoc output, exec Search$GetCache @searchIDAlso throws an error.I can't put it into a local variable because they can't be of type text. I can't pass it into the SP somewhere as it's being generated on the fly.How can I get my xml into sp_xml_preparedocument?Cheers,Matt
Can someone tell me why my stored procedure is repeating the Name in the same column? Here's my stored procedure and output: select distinct libraryrequest.loanrequestID, titles.title, requestors.fname + ' ' + requestors.lname as [Name], Cast(DATEPART(m, libraryrequest.requestDate) as Varchar(5)) + '/' + Cast(DATEPART(d, libraryrequest.requestDate) as Varchar(5)) + '/' + Cast(DATEPART(yy, libraryrequest.RequestDate) as Varchar(5)) as RequestDate, Cast(DATEPART(m, libraryrequest.shipdate) as Varchar(5)) + '/' + Cast(DATEPART(d, libraryrequest.shipdate) as Varchar(5)) + '/' + Cast(DATEPART(yy, libraryrequest.shipdate) as Varchar(5)) as ShipDate from LibraryRequest join requestors on requestors.requestorid=libraryrequest.requestoridjoin titles on titles.titleid = requestors.titleidwhere shipdate is not null Output: ID Title Name Request Date Ship Date 29 Heads, You Win Brenda Smith 1/18/2008 1/18/200835 Still More Games Brenda Smith 1/22/2008 1/22/200851 The Key to.. Brenda Smith Brenda Smith 1/29/2008 1/29/200852 PASSION... Brenda Smith Brenda Smith 1/29/2008 1/29/200853 LEADERSHIP Brenda Smith Brenda Smith 1/29/2008 1/29/2008 Going crazy ugh...
hi iam working for a stored procedure where i am inserting data for a table in a database and after inserting i must get the ID in the same procedure.later i want to insert that output ID into another table inthe same stored procedure. for example: alter procedure [dbo].[AddDetails] ( @IndustryName nvarchar(50), @CompanyName nvarchar(50), @PlantName nvarchar(50), @Address nvarchar(100), @Createdby int, @CreatedOn datetime ) as begin insert into Industry(Industry_Name,Creadted_by,Creadted_On) OUTPUT inserted.Ind_ID_PK values(@IndustryName,@Createdby,@CreatedOn) insert into Company(Company_Name,Company_Address,Created_by,Created_On,Ind_ID_FK) OUTPUT inserted.Cmp_ID_PK values(@CompanyName,@Address,@Createdby,@CreatedOn) insert into Plant(Plant_Name,Created_by,Creadted_On,Ind_ID_FK,Cmp_ID_FK)values(@PlantName,@Createdby,@CreatedOn,@intReturnValueInd,@intReturnValueComp) end
Here iam getting the output ID of the inserted data as OUTPUT inserted.Ind_ID_PK and later i want to insert this to the company table into Ind_ID_FK field.how can i do this. Please help me, i need the solution soon.