Hello,
I am new to SSIS.
I am trying to write a simple package to export data from some SQL 2005 tables and into a flat file.
In my data flow, I am using the OLE-DB data source and then the flat file destination.
This all works fine except that I cant get the package to write the columns out in the order I want. Even when I drive the OLE-DB source by a query, they columns are getting written to the flat file in a different order than I want.
How is SSIS determining what order to write the columns in and, more importantly, how can I change it to do it in the order I want?
Please help if you can. As mentioned I am new to SSIS so please give clear+simple answers.
I have a simple data control task that has an OLE source and OLE target.
the source is a SQL query that returns 200m records this is then written straight out to a table. I have used a data flow task so that I can chunk up the inserts rather than using a INSERT INTO.....SELECT FROM.
I have ran it multiple times and it hangs once it reaches 18,961,020 records.
there is no locking on the database and I have even restarted the SQL instance to ensure that there was nothing else contending for resource.
I have changed gthe buffer size and rows per buffer to 100m and 100,000. Now it hangs before the 18.9m mark presumably becuase of the increased buffer size.
I notice that the SELECT statement continues to clock up io and cpu cycles but the BULK insert process has gone.
In Excel they are showing Sale as -ve quantity and purchase as +ve quantity.
The database has quantity always as +ve figure and a separate column "isPurchase" set to true or false depending on whether purchase or sale.
So I need Derived Column to return a bolean (or int) depending whether quantity is positive or negative. I tried each of the following in the Expression but all of them were invalid expressions.
CASE WHEN [TotalQuantity] > 0 THEN 0 ELSE 1 END
If [TotalQuantity] > 0 THEN 1 ELSE 0 END
IIF ([TotalQuantity] > 0, 1,0)
Can anyone help me with correct syntax, or correct Data Flow Transformation if Derived column is wrong.
Has anyone done this? I can't find anything in the documentation that describes this. The closest I get is to the InnerObject property of the TaskHost class. There is an example of programming a bulk insert task. But I can't find anything on programmatically setting the column mappings (source to dest) of a simple data flow task. Any help is appreciated!
I've built a simple custom data flow transformation component following the Hands On Lab (http://www.microsoft.com/downloads/details.aspx?familyid=1C2A7DD2-3EC3-4641-9407-A5A337BEA7D3&displaylang=en) and the Books Online (ms-help://MS.MSDNQTR.v80.en/MS.MSDN.v80/MS.SQL.v2005.en/dtsref9/html/adc70cc5-f79c-4bb6-8387-f0f2cdfaad11.htm and ms-help://MS.MSDNQTR.v80.en/MS.MSDN.v80/MS.SQL.v2005.en/dtsref9/html/b694d21f-9919-402d-9192-666c6449b0b7.htm).
All it is supposed to do is create an output column and set its value to the result of calling a web service method (the transformation is synchronous). Everything seems fine, but when I run the data flow task that contains it, it doesn't generate any output. The Visual Studio debugger displays it as yellow, with 1,385 rows going into it, but the data viewer attached to its output is empty. The output metadata looks just like I expect: all of my input columns plus the new column, correctly typed. No validation or run-time warnings or errors are reported.
I'll include the entire C# file below, which only overrrides the ProvideComponentProperties, Validate, PreExecute, ProcessInput, and PostExecute methods of the parent PipelineComponent class.
Since this is effectively a specialization of the DerivedColumn transformation, could I inherit from the class that implements the DC component instead of PipelineComponent? How do I even find out what that class is?
Thanks! Here's the code: using System; // using System.Collections.Generic; // using System.Text;
using Microsoft.SqlServer.Dts.Pipeline; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper;
namespace CustomComponents { [DtsPipelineComponent(DisplayName = "GID", ComponentType = ComponentType.Transform)] public class GidComponent : PipelineComponent { /// /// Column indexes for faster processing. /// private int[] inputColumnBufferIndex; private int outputColumnBufferIndex;
/// /// The GID web service. /// private GID.WS_PDF.PDFProcessService gidService = null;
/// /// Called to initialize/reset the component. /// public override void ProvideComponentProperties() { base.ProvideComponentProperties(); // Remove any existing metadata: base.RemoveAllInputsOutputsAndCustomProperties(); // Create the input and the output: IDTSInput90 input = this.ComponentMetaData.InputCollection.New(); input.Name = "Input"; IDTSOutput90 output = this.ComponentMetaData.OutputCollection.New(); output.Name = "Output"; // The output is synchronous with the input: output.SynchronousInputID = input.ID; // Create the GID output column (16-character Unicode string): IDTSOutputColumn90 outputColumn = output.OutputColumnCollection.New(); outputColumn.Name = "GID"; outputColumn.SetDataTypeProperties(Microsoft.SqlServer.Dts.Runtime.Wrapper.DataType.DT_WSTR, 16, 0, 0, 0); }
/// /// Only 1 input and 1 output with 1 column is supported. /// /// public override DTSValidationStatus Validate() { bool cancel = false; DTSValidationStatus status = base.Validate(); if (status == DTSValidationStatus.VS_ISVALID) { // The input and output are created above and should be exactly as specified // (unless someone manually edited the persisted XML): if (ComponentMetaData.InputCollection.Count != 1) { this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component accepts 1 Input.", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } else if (ComponentMetaData.OutputCollection.Count != 1) { this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component provides 1 Output.", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } else if (ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count != 1) { this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component Output must be 1 column.", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } // And the output column should be a Unicode string: else if ((ComponentMetaData.OutputCollection[0].OutputColumnCollection[0].DataType != DataType.DT_WSTR) || (ComponentMetaData.OutputCollection[0].OutputColumnCollection[0].Length != 16)) { ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component Output column data type must be (DT_WSTR, 16).", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISBROKEN; } } return status; }
/// /// Called before executing, to cache the buffer column indexes. /// public override void PreExecute() { base.PreExecute(); // Get the index of each input column in the buffer: IDTSInput90 input = ComponentMetaData.InputCollection[0]; inputColumnBufferIndex = new int[input.InputColumnCollection.Count]; for (int col = 0; col < input.InputColumnCollection.Count; col++) { inputColumnBufferIndex[col] = BufferManager.FindColumnByLineageID(input.Buffer, input.InputColumnCollection[col].LineageID); } // Get the index of the output column in the buffer: IDTSOutput90 output = ComponentMetaData.OutputCollection[0]; outputColumnBufferIndex = BufferManager.FindColumnByLineageID(input.Buffer, output.OutputColumnCollection[0].LineageID); // Get the GID web service: gidService = new GID.WS_PDF.PDFProcessService(); }
/// /// Called to process the buffer: /// Get a new GID and save it in the output column. /// /// /// public override void ProcessInput(int inputID, PipelineBuffer buffer) { if (! buffer.EndOfRowset) { try { while (buffer.NextRow()) { // Set the output column value to a new GID: buffer.SetString(outputColumnBufferIndex, gidService.getGID()); } } catch (System.Exception ex) { bool cancel = false; ComponentMetaData.FireError(0, ComponentMetaData.Name, ex.Message, string.Empty, 0, out cancel); throw new Exception("Could not process input buffer."); } } }
/// /// Called after executing, to clean up. /// public override void PostExecute() { base.PostExecute(); // Resign from the GID service: gidService = null; } } }
Seems obvious but I can't see how. How would I remove columns from a data flow so that columns which have been used earlier but are not needed for insert/update are taken out of the flow.
I'm asking because the data ends up in a update statement and the flow has got so big it is unreadable.
However, the first three columns are not being populated in the destination table. The other columns come over fine.
The SQL stmt. returns data as expected when run against the source database.
I deleted the source and destination and recreated the flow to prevent metadata mapping issues. In the source editor preview I see all of the columns and data. In the destination editor preview, the first three columns of data are null ???.
It appears that the columns are not mapping properly even though they are in the source and destination of the mapping editor.
I have made sure that the destination mapping contains all the columns in the UI.
The source and destination have the columns represented in the advanced editor metedata. I also checked the XML to verify that the columns are in the destination.
There is a row count between the source and destination. which should have no effect.
This is a part of a larger DW load where I have 10 other tables populated within the dataflow. I also do not get any validation, or error messages. So, I have eliminated truncation errors or the like.
I am really puzzled. Has anyone run accross anything like this?
I have a master securities table which has 7 fields. As a part of the daily process I am uploading flat files into database tables. The flat files contains the master(static) security data as well as the analytics(transaction) data. I need to
1) separate the master (static) data from the flat files,
2) check whether that data is present in the master table, if not then insert that data into the master table
3) If data present then move that existing record to an history table and then update the main master table.
All the 7 fields need to be checked to uniquely identify a single record in the master table.
How can this be done? Whether we can us a combination of data flow items or write a sql procedure to do all this.
[Production IDW [1]] Warning: Truncation may occur due to retrieving data from database column "Industry" with a length of 16 to data flow column "Industry" with a length of 8.
When I look at the industry lookup table the column size is 50. But the data flow metatdata is saying the oclumn is 8. How can I change the data flow column? When I try to edit it and click on metadata it is uneditable. The field it is matching to is 50 and the field it is coming from is 16.
I have a data flow task in which there is a OLEDB source, derived column item, and a oledb destination. My source is a SQL command, that returns some values. I have some values, that I define in the derived columns, and set default values under the expression column. My question is, I also have some destination columns which in my OLEDB destination need another SQL command. How would I do that? Can I attach two or more OLEDB sources to one destination? How would I accomplish that? Thanks
I need to loop the recordset returned from a ExecuteSQL task and transform each row using a Data Conversion task (or a Script Task).
I know how to loop the recordset returned by an ExecuteSQL task:
http://www.sqlis.com/59.aspx
I loop the returned recordset (which is mapped to a User variable of type System.Object) and assign the Variable Mappings in the ForEach Loop to different user variables which map to the Exec proc resultset (with names and data types).
I assume to now use these as the Available Input columns for the Data Conversion task, I drag a Data Flow task inside the For Each Loop container and double-click it, then add a Data Conversion task.
But the Input columns (which I entered in the Variable Mappings in the ForEach Loop containers) dont show up in the Available Input columns of the Data Conversion task.
How do I link the Variable Mappings in the ForEach Loop containers from the recordset returned by the Execute SQL Task to the Available Input columns of the Data Conversion task?
.......................
If this is not possible, and the advice is to use the OLEDB data flow as the input for the Data Conversion task (which is something I tried too), then the results from an OLEDB Command (using EXEC sp_myproc) are not mapped to the Available Input columns of the Data Conversion task either (as its not an explicit SQL Statement and the runtime results from a stored proc exection)
I would like to use the ExecuteSQL task to do this as the Package is clean and comprehensible. Which is the easiest best way to map the returned results from a Stored proc execution to the Available Input columns of any Data Flow transformation task for the transform operations I need to execute on each row of data?
[ Could not find any useful advice on this anywhere ]
i add new column using alter command but i always found it in the end of table but i want to add it in particular position between the columns...........
I am trying to format a matrix report so that columns appear in a specified order.
An example of what I mean is, I have 3 columns; New, Additional and Old.
When these columns are dynamically generated by RS they are put in alphebitic order. I want them to appear in the order in which I have them above.
The dataset returns a sequence (int) for each of the columns, so 'New' = 1, 'Additional' = 2 and 'Old' = 3. I am ordering on that sequence, but still can't get it to work. These are actually column groupings.
What am I doing wrong? I don't want the column sorted (i.e. data sorted within a row), but the columns to appear in a specific order.
I need to pass a parameter from control flow to data flow. The data flow will use this parameter to get data from a Oracle source.
I have an Execute SQL task in control flow to assign value to the Parameter, next step is a data flow which will need take a parameter in the SQL statement to query the Oracle source,
The SQL Looks like this:
select * from ccst_acctsys_account
where to_char(LAST_MODIFIED_DATE, 'YYYYMMDD') >?
THe problem is the OLE DB source Edit doesn€™t have anything for mapping parameter.
I have a For Each Loop container, and inside the container, I have a SQL Execute task, which runs first, and then I need to kick off 5 Data Flow Tasks. Do I need to connect the 5 DFTs to each other using the Green(Pipelines). How would you usually do this? Thx.
I have an Execute SQL Task that returns a Full Rowset from a SQL Server table and assigns it to a variable objRecs. I connect that to a foreach container with an ADO enumerator using objRecs variable and Rows in first table mode. I defined variables and mapped them to the columns.
I tested this by placing a Script task inside the foreach container and displaying the variables in a messagebox.
Now, for each row, I want to write a record to an MS Access table and then update a column back in the original SQL Server table where I retreived data in the Execute SQL task (i have the primary key). If I drop a Data Flow Task inside my foreach container, how do I pass the variables as input to an OLE DB Destination on the Data Flow?
Also, how would I update the original source table where source.id = objRects.id?
Thank you for your assistance. I have spent the day trying to figure this out (and thought it would be simple), but I am just not getting SSIS. Sorry if this has been covered.
Dear All! My package has a Data Flow Task. In Data Flow Task, I use a Script Component and a OLE BD Destination to transform data from txt file to database. Within Data Flow Task, I want to call File System Task to move file to a folder or any Task of "Control Flow" Tab. So, Does SSIS support this task? Please show me if any Thanks
I'm currently setting variables at the package level with an ExecuteSQL task. This works fine. However, I'm now starting to think about restartability midway through a package. It would be nice to have the variable(s) needed in a data flow set within the data flow so that I only have to restart that task.
Is there a way to do that using an SQL statement as the source of the value in a data flow?
OR, when using checkpoints will it save variable settings so that they are available when the package is restarted? This would make my issue a moot point.
Hi all! I recently started working with SSIS and one of the things that is puzzling me the most is what's the best way to go:
A small control flow, with large data flow tasks A control flow with more, but smaller, data flow tasksAny help will be greatly appreciated. Thanks, Ricardo
Why isn't there some documentation on how to do this. This should be really simple and it has taken me 2 weeks and I still haven't gotten an answer. Please Help Does anyone know the answner or some place where there is some documentation!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I get the following error when I try to substitute the strings in the databasedetails collection with variables: Error: Object reference not set to an instance of an object. StackTrace: at Microsoft.SqlServer.Dts.Tasks.TransferObjectsTask.TransferObjectsTask.CheckLocalandDestinationStatus(Database srcDatabase, DatabaseInfo dbDetail) at Microsoft.SqlServer.Dts.Tasks.TransferObjectsTask.TransferObjectsTask.TransferDatabasesUsingSpAttachDetach()
I created the following variables: strDestinationDB = AirCL2Exp_new3 strDestinationDBPath = C:Program FilesMicrosoft SQL ServerMSSQL.2MSSQLDATAAirCL2Exp_new3_Data.mdf strDestinationLGPath = C:Program FilesMicrosoft SQL ServerMSSQL.2MSSQLDATAAirCL2Exp_new3_Data.ldf strSourceDB = AirCL2Exp strSourceDBPath = C:Program FilesMicrosoft SQL ServerMSSQL.2MSSQLDataDataNewAirCL2Exp_Data.mdf strSourceLGPath = C:Program FilesMicrosoft SQL ServerMSSQL.2MSSQLDataDataNewAirCL2Exp_Log.ldf
I then assigned those variable to DatabaseDetails Collection:
Inaddtion I also assigned the following to the two DatabaseFiles Collection: for 0: DatabaseFileSize = 0 DestinationFilePath = @strDestinationDBPath FileType = DatabaseFile SourceFilePath = @strSourceDBPath SourceSharePath = @strSourceDBPath
Hi, I'm trying to implement an incremental data pull (Oracle to SQL) based on Andy's blog: http://sqlblog.com/blogs/andy_leonard/archive/2007/07/09/ssis-design-pattern-incremental-loads.aspx
My development machine is decent: 1.86 GHz, Intel core 2 CPU, 3 GB of RAM. However it seems the data flow task gets hung whenever I test the package against the ~6 million row source, as can be seen from these screenshots. I have no memory limitations on the lookup transformation. After the rows have been cached nothing happens. Memory for the dtsdebug process hovers around 1.8 GB and it uses 1-6 percent of CPU resources continuously. I am not using fast load to insert new records into my sql target table. (I am right clicking Sequence Container 3 and executing this container NOT the entire package in the screenshots)
The same package works fine against a similar test table with 150k rows. http://i248.photobucket.com/albums/gg168/boston_sql92/7.jpg http://i248.photobucket.com/albums/gg168/boston_sql92/8.jpg
The weird thing is it only takes 24 minutes for a full refresh of the entire source table from Oracle to the SQL target table. Any hints,advice would be appreciated.
i will tell whats it ------ i have a webpage creating using ASP.NET,C# using SQL server 2000.
i have a table in the database which has attribute named "Priority" which has values like "High" "Low" "Medium". all i wana to make is that i enter any of this value tru a dropdown list into the database and i have used Datalist in Webpage so that i will retrieve back the info from the database. when any request is made to retrieve back this value . i want to see all the priority field in the datalist which has priority as "High" in bold and in order. is this possible ,if it is plz let me know
It is one of the covenants of DBMS that data rows are retrieved in no particular order, and I have learned not to rely on that. However, I'd like to know under what circumstances could rows be returned, e.g., as a result of a simple query on SQL Server, not in the order they were stored. I have never seen this happen so far. I can't even test my sorting algorithms (on the data after it gets retrieved from the server) since it always comes already sorted. Naturally, I can always simulate this scenario (and I have) but this is kind of a "conflict of interest", I'd like to see "the real thing". Any ideas? Kamen
I'm a SQL noob still and need some help figuring out a weird problem. I have directory information I am trying to pump out in the correct order, no big deal except some entries need to be categorized based on a simple hierachy. So for example, there is an entry for a company division, and then there are all the subdivisions, and then a few offices. So it goes like this:
In other words, Desktop is an office in Support which is a part of Computing.
Now what I want is to basically pump all that data out via SQL in that exact order. So I have 3 fields that I am using and then trying to arrange it using Order By - except it doesn't come out correctly.
Code:
SELECT Division, Subdivision, Office FROM directory ORDER BY Division, Subdivision, Office
What happens with this is that I end up with all my Divisions in order only. So for example, the main entry for Computing has "Computing" as it's Division and Office, but I want it to appear first in the list so I set Subdivision to be "A" - Instead of it appearing first it appears second below an entry for a different Division that also has it's Subdivision set as "A". I end up with Divisions mixed into the wrong cluster of subdivisions and offices.
Anyone have any ideas? It seems like this should be fairly simple and yet somehow it isn't. Thanks in advance for any help!
In my search results I need to order my data by a particular entry rather than a particular column and then add all the other entries onto the end of the result set. Specifically I'm trying to order my records by a particular location in a city so those entries in that location appear first in the result set and then on the end I want to include all the other locations in that city.
I'm assuming I'll need to do these as separate queries and then use a UNION statement but I'm wondering if there is a better or easier way. Is there, for example, another type of ORDER BY clause that lets me order by a particular entry rather than a column?
I am wondering if it is possible to use SSIS to sample data set to training set and test set directly to my data mining models without saving them somewhere as occupying too much space? Really need guidance for that.