SSIS: Making A System Fragile And Difficult To Manage?
Apr 1, 2008
Hello,
I am fairly new to SSIS and do not have a wealth of 'in-depth' knowledge of it and I am running into some problems.
I am part of a team that is using Agile techniques (Scrum based) to implement a reasonably complicated system. This system has the requirement for numerous file feeds and multiple transactional database instaces whos base data is driven from a master seed database.
We are getting to a point where we are starting to hook up alot of the infrastructure and I am running into problems. Especially with the continous evolution of the database. Every iteration, the SSIS packes break because of DB changes. This is a little scary to me because I am starting to see SSIS as adding multiple points of failure to the system. Agile is based on refactoring often and that includes the database for this project. Every database change either breaks or requires a change to the SSIS packages that moves data from file feeds and staging databases to the master seed database. This problem is even more pronounced when considering the processes that move data from the seed database to the transactional databases. It discourages changing the database which is a really bad thing because that can lead to code archtecture smells and adds fragility.
Expression-based connection strings are great but they are specific to each report which makes them difficult to maintain. Shared expression-based connection strings are not supported.
I'm looking for a way around this. I'm hoping to write a custom data extension, and in the custom data extension, do all the "dynamic" logic that sets the connection string (which you normally would be doing in your expression-based connection string).
I know the above can be done. What I don't know is how, inside a custom data extension, to get the value of expressions like User!UserID? (I want the connection to run under stored credentials, but I want to customize the connection string and add a property to it based on the UserID who is logged into Report Manager.)
I have a package that loops over ~60 Excel files in a directory. Each file has three named ranges in it, which I import into different tables. Sometimes the package runs without a hitch, sometimes it chokes. But it is intermittent.
If I pull the control flow components out of the foreach loop and point the Excel connection manager to the specific Excel file that has caused the package to choke, I get a message in the dataflow component pointing to the named range that "the metadata of the following output columns does not match the metadata of the external columns......Do you want to replace the metadata of the output columns with the metadata of the external columns?" When I choose 'yes', then the file will be loaded. then I can put the control flow components back into the foreach loop and the file will run again, successfully, along with some more, until it chokes again....
So, first of all, does anyone have any insight into this? Sometimes, somedays, these files will load with no problems. These exact files; I am having to reload constantly... Other times, like today, it is a battle.
Otherwise, is there a way to get Integration Svcs to handle the metadata issue on the fly???
Any ideas, resources, references, war stories, or good clean jokes would be appreciated, Kathryn
I am working on a configuration database in SSIS. One of the modules in the package is giving me endless amounts of grief. The module is for some reason set up to add data onto the end of any data already stored in the table. Instead, I would like it to replace this data. I have tried an Execute SQL Task that should delete all of the rows in the table, but this isn't working. Is there a more efficient way to do this?
I created a SSIS project with some SSIS packages within my local machine. Once all development and testing stuff was finished I imported the same to SSIS package store within Integration services. Then I created another test folder within my local machine and copied all the packages along with the project .sln file to that test folder.
Now the problem, If I make any changes to the package within test folder it automatically saves the changes to my other folder. Does anybody have a reason why it is doing so.
We have an SSIS package that was created to migrate data in from a few production databases. The steps for the package are as follows...
backup databases on server 1 (prod database server) restore database to SSIS server (server 2) . truncate worker tables in SSIS server's (server 2) Main DB database. copy data from restored db tables to working db tables ( database to database) Start Multiple threads (15 ) and run steps from here in parrallel Combination of Data flow tasks and SQL scripts and Stored procedures used to flatten data out and combine data for reporting purposes.
The average run time is 8 hours.
the issue we are seeing is this, the package will fluctuate in run times from 4 hours to over 11 with no change in the data or the underlying SSIS package. We have looked for any changes or things that would effect this but have not found anything that changed...
Also, certain steps are running shorter while others double in time. there doesnt seem to be any rhyme or reason to this behaviour. The server is x64 12GB of RAM 2 dual core 3.2Ghz.
Please let me know if you need any more information or specifics...
the only thing I have seen so far that looks out of place is Tempdb has one of its files that is 20+GB.
I have a mixed solution with a C# project and an SSIS project. Now matter how many times I set the C# project as the startup project, it seems that as soon as I make a change to the SSIS package, it decides that it's so unbelievably important that it HAS to be the startup project. Is there any way to disable this?
I have packages that must generate log errors dynamically including time of execution and the name of the task.
I make it changing the properties of the connection inserting two expressions.
1.-I alter the File Usage Type to 1 to generate this files. 2.- I alter the connection string as: @[User::myvariable] +"constant_description"+ time description+ +".txt"
The time description is : (DT_STR,40,1252) DAY (GETDATE())+"-"+(DT_STR,40,1252) MONTH( GETDATE())+"-"+(DT_STR,40,1252) YEAR( GETDATE())+" + REPLACE( (DT_STR,10,1252) (DT_DBTIME) GETDATE(),":","_") But it is not the problem
In those packages I have one connector for all the tasks and in execution time it creates one file for each of the tasks.
The problem is the way I insert in the filename the task name.
I have a pre-execute event handler in each task that modifies a string variable( myvariable) appending the task name. When I execute de package it works great but when I only execute a task, the program do not enter in the event and do not put the task’s name.
How can I put that name without using that handler? There is another handler can I use to do it that happens before the system generates the new file name and after pre-execute? Anyone knows another way to do this kind of things?
Does anybody could help me to fixe this query, inside SQL Task to handle OnError event.
The function DT_TEXT does not work with this query and I need to convert the Error Message so that the french caracter ' can be accepted inside the Insert Query.
The error message is capted inside the Insert Query, for example, the value of the variable ErrorDescription is : 'Erreur dans l'insertion ' with a quote inside, and therfore the Insert Query can not be executed.
I am facing a problem in executing the SSIS package.My System configuration is like this
PIII with 1.40 GHz and 2.5 GB of RAM. Windows 2003 with SQL Server,SSIS installed. 50 GB of HardDive memory is free.
The Scenereo is like this :
We have 20000 records in our oracle database where we are having the CLOB data in it. We are fetching only the CLOB data that is storred in the XML_DATA (field name). Each CLOB has a XML in it, so we are getting each record and shredding that XML into defferent tables using the XSLT and XSD. To run the process fast , we made 5 channels so that data can be populated faster and each channel can populate 4000 records. The data flow in each channel is having the DefaultBufferRows property set to 20 and DefaultBufferSize is 10MB so that it does not take much of the buffer memory. So now when ever i try to run this package with 5 channels enabled i get this error
"Executing the query "select ESTIMATE_ID,xml_dat,id,co_cd from CLM_EST_XML_ESTIMATE WHERE XML_Estimate_ID=?" failed with the following error: "Insufficient memory to continue the execution of the program.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly."
I am not able to Release the Object Memory in each ForLoopContainer.
when I´m trying to save a SSIS Package in Visual Studio I´m having the following problem:
=================================== Exception of type 'System.OutOfMemoryException' was thrown. (Microsoft Visual Studio) ------------------------------ Program Location: at System.String.GetStringForStringBuilder(String value, Int32 startIndex, Int32 length, Int32 capacity) at System.Text.StringBuilder.GetNewString(String currentString, Int32 requiredLength) at System.Text.StringBuilder.Append(Char value) at Microsoft.DataWarehouse.VsIntegration.Designer.Serialization.BufferTextWriter.Write(Char ch) at System.IO.TextWriter.Write(Char[] buffer, Int32 index, Int32 count) at System.IO.TextWriter.Write(Char[] buffer) at System.IO.TextWriter.Write(String value) at Microsoft.DataTransformationServices.Design.Serialization.DtrDesignerSerializer.SerializePackage(IDesignerSerializationManager manager, Package package, TextWriter textWriter) at Microsoft.DataTransformationServices.Design.Serialization.DtrDesignerSerializer.SerializeComponent(IDesignerSerializationManager manager, IComponent component, Object serializationStream) at Microsoft.DataWarehouse.Serialization.DesignerComponentSerializer.Serialize(IDesignerSerializationManager manager, Object value) at Microsoft.DataWarehouse.VsIntegration.Designer.Serialization.DataWarehouseDesignerLoader.Serialize() at Microsoft.DataWarehouse.VsIntegration.Designer.Serialization.BaseDesignerLoader.Flush(Boolean forceful) at Microsoft.DataWarehouse.VsIntegration.Designer.Serialization.BaseDesignerLoader.Flush() at Microsoft.DataWarehouse.VsIntegration.Designer.Serialization.DataWarehouseContainerManager.OnBeforeSave(UInt32 docCookie)
The dtsx is around 8MB size, I have installed the service pack, what else could be?
i am creating a SSIS package, for copying files from local meachine (where the pacakge) to other destination (\servernamefoldernamefilename) using script task. it's work fine while running package.
but if am schedule a job for the same its giving error (package execution failed)
I have the following table:tblFriendsOwnerCode FriendCode7 107 1410 710 1210 1312 1013 1013 1814 718 13 I need a SP which return the following (im unsure about the best return datatype and the sql statement): I want return all friendcodes of user nr 7 (10 and 14)and I want to return all friendcodes of user 10 and 14 (7,12,13,7) WITHOUT user 7 (if possible WITHOUT the use of a temptable!)
I need to move files from one location to another location and rename the file using File System Task in SSIS.. There are curent file and archived files in the folder ( C:donwload) and i need to grap only current file ( i.e Zipcode062407) and move to C:staging folder and rename it to currentzipcode.txt how can i grap only current file and move it to the different folder and rename the file using File System Task in SSIS?
I'm having an issue with a file system task & I'm not sure whether it is user error on my part or a bug. I'm using a SQL Task to create a transaction log backup & I'm saving the name of the file in a result set which I then am mapping to a package level user variable. After that runs I'm trying to copy this .BAK file to another folder using the file system task. I'm setting the following properties on the file system task.
Isdestinationpathvariable: False
Then I have entered the static directory for the file move.
Operation: Tried it with both copy file & move file.
Issourcepathvariable: True
Sourceconnection: User::File_name
After setting this I immediately get a validation warning telling me the source directory cannot be empty. If I try to run it, it fails. The weird thing is that if I set up a connection manager to a flat file & pass my user variable in as the connection string to this connection. Then set the Issourcepathvariable to false & the Sourceconnection to this connection manager it works.
I also have gotten it to work by substituting an ftp task in place of the file system task. The ftp task has no problem when I set the Islocalpathvariable to true & then pass my variable to Localvariable property. This is why I believe there is some sort of issue with the file system task. Has anyone seen this before? Is there some sort of problem with the way I'm setting it up?
I don't know what is wrong but the moment i open SSIS it run's too slow... AND if i have 100 warnings it takes it's own time to even open the designer...
I am just getting frustated with this
If i open my browser it's slow n email slow... Any help is appreciated..
I'm executing SSIS package, like a datareader source in a report in SSRS. Everything it's ok. But, I'm using @[System::UserName] variable like a add column to save the user that ran the package.
But it's incredible, when the user execute the report, the system don't save the login that opened the browser, neither the user that run the ssis service or user for ssrs service. This is catching the user that opened session in the host where Sql server was installed.
I need catch the user that run the report, how can i do this?
Att,
JULIAN CASTIBLANCO P MCTS SQL SERVER 2005 BOGOTA, COLOMBIA
Wanting to use File System Task in SSIS to move files from one location to another for archiving. I can't seem to figure out how to use a wild card for the file name. It seems that I must specify the actual file name which is a problem because only the first 4 letters in the file name remain a constant.
Does anyone know how to use a wild card or a way to work this in?
I have roles set up for different companies. The role names are structured companyname_department, ex., CallawayContracting_SalesDepartment CallawayContracting_Administration FredsAutobody_PaintSales How would I search the roles and return only departments that belong to a certian company and also users that belong to a certian company. I appear to have gotten myself into quite a bind. Any help would be much appreciated! I will be certian to click best answer.
I'm pretty new to this so I'll explain as best I can.
I am building a small DB which will track attendance for employees based on a point system. I believe I'm almost there (with this piece) but am stuck.
The basic concept: 1. Collect all records from the attendance table for the previous 14 day period 2. sum the points column in the attendance table and group by UID, storing in a new table called TOTAL_POINTS ONLY for those UID's which have a value > 0 3. Perform a basic insert into statement on the attendance table for each UID matching those found in the previous TOTAL_POINTS table
Number 3 is where I'm failing and could really use some help.
My code thus far... -------------------------------
/*Declare local variables for current date and start date.*/ -- DECLARE @DateNow DATETIME DECLARE @StartDate DATETIME SET @DateNow=getdate() SET @StartDate = DATEADD(Day, -14, @DateNow) -- /*Create table to hold totals for future calculations*/ CREATE TABLE POINT_TOTALS (UID int, TOTAL float)
/*select ALL records from the table within the above defined date range and group them by UID tallying the score column*/ -- INSERT INTO POINT_TOTALS SELECT UID, SUM (POINTS) AS TOTAL_POINTS FROM attendance WHERE date >= @StartDate GROUP BY UID -- /*If the TOTAL_POINTS > 0 for the 14 day period, insert a record in to the attendance table which deducts .5 points for the UID in question*/
*** This is where I'm failing ***
--This was just to make sure I was returning the correct results to the POINTS_TOTAL table. SELECT UID FROM POINT_TOTALS WHERE TOTAL > 0
/*All I want to do now is for each of the UID's in the POINT_TOTALS table, I want to perform a basic insert on the ATTENDANCE table where the UID's in both match. I would think this to be fairly simple but I can't seem to figure it out. */
I have a table where i have 4 columns in it which i need to group together and then sum up a cost column also. I want to sum up the columns where i have a parent and and child and then i want to sum up the other column where i have only a child. Example of the data is below. I think i need to do this in a sub query
ID Ind Parent Child Cost P110041012705921.8000 W11004101270595.4500 A110041012705921.8000 B110041012705916.3500 R110041012705916.3500 B0100420043.3000 P0100420043.3000 W0100420021.6500
Hello !This is my table:Ordernr Date ArticleO1 1.1.05 22O2 2.2.05 33O3 5.5.05 22O4 2.2.05 33O7 8.8.05 55I need one result-row for each article with the newest Order(max(date)):article lastDate lastOrdernumber22 5.5.05 O333 2.2.05 O455 8.8.05 O7How can I get this ?I tried this:SELECT distinct article, max(date), max(ordernr)FROM tableGROUP BY articlearticle and max(date) is ok, but I am not sure that max(ordernr) andmax(date) comes from the same row.I think, I will need complex subqueries.Many thanksaaapaul
Hello !I habe 2 TablesTable1: OrdersFields: Ordernr, OpiecesTable2: CalloffsOrdernr, CpiecesIn Table1 ordernr is primary key.In Table2 the same ordernr can exist oftenMy problemIf the sum(Cpieces) < Opieces:I have to create a new virtual calloffwith Cpieces = opieces - sum(cpieces)Its too high for me.Please helpBest regardsaaapaul
HiI am trying to produce an update trigger. I understand the concept ofdelete and insert triggers without a problem. Unfortuantely, theupdate triggers do not have particularly simple documentation in BoL.So, can someone please explain to me, quite simply how I would producea trigger on the following:I have table 1 which we'll call simon. In here are various columns androws. I also have table 2, called simon_a, my audit table.Whenever anything is updated or deleted in simon, I want it sent tothe simon_a table. Delete, as above, is fine since it's conceptual buthelp me out on the update one. I cannot seem to figure out how to getthe information from the table before it's updated.As ever, champagne and beer for the successful answer.With thanksSimon
I have a table that stores billing rates for our employees by client.Each employee can have a different billing rate for each client for aspecified period. Here are the columns in the table.eid - Employee ID#cid - Client ID#startdt - start date of billing rateenddt - end date of billing ratebrate - billing rateI need to create a script that will verify that for a given eid, and cidthat either the startdt or enddt for one billing rate, the periods donot overlap.For example, I need to be able to detect overlaps such as this:eid cid startdt enddt brate001 001 1/1/2003 12/31/2003 $50001 001 11/01/2003 04/01/2004 $75*** Sent via Developersdex http://www.developersdex.com ***Don't just participate in USENET...get rewarded for it!
suppose I have the following table:CREATE TABLE (int level, color varchar, length int, width int, heightint)It has the following rows1, "RED", 8, 10, 122, NULL, NULL, NULL, 203, NULL, 9, 82, 254, "BLUE", NULL, 67, NULL5, "GRAY", NULL NULL, NULLI want to write a query that will return me a view collapsed from"bottom-to-top" in order of level (level 1 is top, level 5 is bottom)So I want a query that will returnGRAY, 9, 67, 25The principle is that looking from the bottom level up in each columnwe first see GRAY for color, 9 for length, 67 for width, 25 forheight. In other words, any non-NULL row in a lower level overridesthe value set at a higher level.Is this possible in SQL without using stored procedures?Thanks!- Robert
I Installed CE 3.1 months ago to play around with it to see how easy or difficult it would be to move from SQL Express to CE. Got it all to work with SQL Managment and VS 2005. Now, months later I am back to really upgrading my product to use CE. Oops, now we are on version 3.5. Decided to stay with 3.1 until I realized transact-sql command "TOP" is not supported in 3.1. So, onto the hours of research to do a simple upgrade. Here's where I stand:
1. Uninstalled CE 3.1. Installed CE 3.5. Oh, if only it were that easy. Visual Studio 2005 is still using the dll from 3.1 and still can only see reference to 3.1 in the GAC.
2. SQL Management Studio now cannot open the .sdf database.
3. There is no mention of using CE 3.5 with VS 2005, only VS 2008. Is VS 2008 a requirement to be able to work with .sdf file in VS? I remember I had to install "Microsoft SQL Server 2005 Compact Edition Tools for Visual Studio 2005" when I went through the first installation of CE. Is this now obsolete if I want to use 3.5? I don't even remember what its purpose was other than I needed it.
It seems there is and upgrade.exe file that need to be run command line to updgrade my .sdf file. Are you kidding?
Has anyone upgraded while still using VS 2005? Any advice is greatly appreciated.
Why have you made connecting to a sql server express database so difficult?
I have it working locally, but going to production has been nothing but a nightmare.
I can't even connect locally on the production box.
I am on a dedicated server and my login is an Admin on the box. I have just installed SQL Express + the fancy management interface.
I have made a copy of my database that I was connecting to dynamically (which was failing remotely) and I have attached it to the server. I have made myself a user of the database and granted my self evey permission available.
I have turned on impersonation in my web.config. The database knows it's me trying to connect and STILL denies me when I'm actually ON the production server. It is not even a remote connection problem.
How can I sit there an look at myself as a user of the database in the admin interface, yet I cannot connect via a web app. With SQL server 2000 and MSDE it was soooo simple....
Here is the error: Cannot open database requested in login 'TEST'. Login fails. Login failed for user 'DEDICATEDquick'.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.Data.SqlClient.SqlException: Cannot open database requested in login 'TEST'. Login fails. Login failed for user 'DEDICATEDquick'.
I have a screen shot of the admin interface to prove I am a user.
This should be a 5 minute task that has eaten up days.
From this data I can see that serial number 0000-0000-0000-0006 Successfully completed part 1 and part 2 of the transaction, as did serial number 0000-0000-0000-0004.
Serial number 0000-0000-0000-0092 had trouble, it connected at 13:59:04 (tblremoteunitrequestID) but part 2 didnt complete, so it wasent saved in tblremoteunitrequests. The same happened at 12:15:22 but at 10:31:54 it was successful so it was saved.
I Only want to display the transactions that didnt complete, sounds easy huh?
CASE WHEN TBLRemoteUnitRequests.DateReceived BETWEEN DATEADD(SECOND,-1,TBLRemoteFeildUnitRequestID.RecordDate) AND DATEADD(SECOND,10,TBLRemoteFeildUnitRequestID.RecordDate)
THEN ' Ok'
ELSE ' Not ok'
END AS PROBLEM
FROM TBLRemoteFeildUnitRequestID LEFT OUTER JOIN TBLRemoteUnitRequests ON TBLRemoteFeildUnitRequestID.Serial = TBLRemoteUnitRequests.Serial WHERE TBLRemoteFeildUnitRequestID.RecordDate BETWEEN DATEADD(WEEK, - 2, GetDate()) AND GetDate()
ORDER BY RecordDate DESC
This kinda worked, but it caused records that satisfied the between condition to be displayed twice, once as "Ok" and once as "Not ok".
Heres a sample of the result I got:
Serial | RecordDate (1st part of transaction) | Status
0000-0000-0000-0006 2006-11-13 14:00:36.000 Ok (Duplicated) 0000-0000-0000-0006 2006-11-13 14:00:36.000 Not ok 0000-0000-0000-0004 2006-11-13 14:00:30.000 Not ok (Duplicated) 0000-0000-0000-0004 2006-11-13 14:00:30.000 Ok 0000-0000-0000-0092 2006-11-13 13:59:04.000 Not ok (Correct) (Not duplicated) 0000-0000-0000-0092 2006-11-13 12:15:22.000 Not ok (Correct) (Not Duplicated) 0000-0000-0000-0092 2006-11-13 10:31:54.000 Not ok (Duplicated) 0000-0000-0000-0092 2006-11-13 10:31:54.000 Ok 0000-0000-0000-0006 2006-11-13 10:00:29.000 Ok (Duplicated) 0000-0000-0000-0006 2006-11-13 10:00:29.000 Not ok
I have just about had enough, I have wasted an entire day on this