I am pulling data from FoxPro tables into SQL 2005, and want to only pull new or changed rows. Accordingly each table in Fox has a column LastChangedDateTime, indicating the last time the row was updated, and I have a table in SQL which has one row per Fox table, listing the table name and the most recent data pulled into SQL.
In 2000 DTS I would have pulled the SQL datetime value into a package variable, then used a parameterized SQL statement with ".. WHERE LastChangedDateTime > ? " to select the rows I require.
In SSIS this approach does not seem to be possible, and the options are that I either use a variable for the entire SQL statement or, as the first SSIS tutorial suggests, use a lookup against the SQL table.
Gut feel is that the lookup will perform slower than creating the variable SQL and executing that (given that the source table is 13 million rows and rising, and I only want the last 100,000 or so from today).
What is considered best practice under these circumstances?
Also is it possible to write SSIS scripts in C# rather than VB.NET, as the syntax differences are driving me mad? ;-)
What is the best way to setup relationships between one lookup tableand many other tables. The tables did not have any lookup tablerelationships which I am adding. One lookup table is used for same datain several different places.To use one lookup tables with several tables, I had to disable "CascadeUpdate" and only have "enforce relationships for updates and inserts"checked.Any pros/cons?Thanks in advance.P
SSIS data flow transformation - Lookup task - best practice concerning Cachetype
I would like to know if there's any best practice concerning the CacheType property for the the Lookup task. Default value is "Full", but if the SSIS package is working with at lot of data, i.e. +10 mill. records from the OLE DB source to be handled through a variated numbers of data flow transformation tasks, it must have an impact memory usage if the lookup table also is a large table, i.e. +8 mill. records? When should I consider turning the property value to "none"
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute Taking a snapshot of the reference table Taking a snapshot of the reference table Building Fuzzy Match Index component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record. Can you still use the lookup function with more than one field to compare against? So for example
Department \___SalesPerson \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged:
I have a Conditional Split with 3 outputs. On the first output I have a lookup, when I execute the package I have 56 rows going through the Conditional Split, all rows are then going to the 2nd and 3rd output but the lookup on the first output generates an error "Row yielded no match during lookup".
I don't understand why the lookup is generating an error while there is no row going through it.
I am designing a ssis package,This is intends to mine text data(Data extracted from websites). Term lookup/Term extraction has been used as tools for mining. I have lookup terms defined with me for reference table,but the main problem lie in extracting the nearby text/number/charcters to these lookup terms during mining. For example : I found noun "Email" 200 (frequency score) times in my text,Now I want to extract nearby email address(this is also true for PhoneNumber,Address attributes also).so how can I achieve this with SSIS. If u have some idea/suggestion to carry out this challenge with or without Term Extraction/Term Lookup,plz do write here.
When setting up databases for end users, what's the best practice regarding who's the dbo for each individual database - the user itself or a sysadmin?
Does it really have any importance at all who the owner (as defined by 'dbo') is ?
1.- a list of all the terms that start with A% 2.- a list of all the related terms … that belong to terms that start with A%
For number 1 - I am doing a select on Terms table with where term like A%.
For number 2 – I am joining both tables and then once again doing a where term like A%.
Would it be more efficient to take the first results and put them in a table variable, and then just do a join with the second table RelatedTerms.TermID = Terms .TermID
The number of records that generally comeback are between 500 to 1000 records that
What would you consider is a better approach ? or maybe there is an even better way ?
i'm a newbie for sql , but i want to learn sql on my own , is there any way that i can learn sql , do i have to download sample database from the internet, do i need to have my own server to play with. Hopefully someone show some lights on this.
Please point me to a web resource from where I can study:1) writing complex queries such as those involving HAVING, mult-levelnested queries, GROUP BY, T-SQL functions2) Joins - a lot of practice3) Stored Procedures, transactions, cursors and triggers - I need someheavy-duty practiceWhere can I get some good practice of the above? Also, please recommenda good SQL Server/T-SQL book in the light of the above requirement.
Folks - had a look around Google and no surprises, but never found what i was looking for.
I want to see a real work best practice C# Stored Procedure for Sql 2005 (express is what i am using, but don't mind the Sql edition).
Almost everything i see is a "select * from table" which to be honest was my first stored proc many years ago - everything since has been fairly detailed.
I ask as i am sceptical, after years of trying to STOP building Sql queries in code (as it's hellish!) that the CLR technique really makes any kind of a diffence.
If someone has found that it HAS i'd love to hear about it. The thought of:
SqlCommand cmd = new SqlCommand ( "My Whole Stored Proc as Text" );
... doesn't appeal, never mind the potential for debugging syntactical issues and so on.
I was excited by this, until it became something i had to do in a real situation and then i got a little worried. Should i be?
create a table and name it Salary Information. Add an Employee Name and Salary column to the table. Create a column in the Employee table and name it Salary. Create a trigger that updates the Salary table with the employees's name and salary each time u insert data into the Salary column of the Employee table.
Hi, my database is growing over 1Gb, and I only have one .mdf to keep them all. Should I use a secondary data file for my data? Can I do that now? Thanks.
Good Morning, I work for a company that has sees alot of people come and go. The one thing I have noticed is that people use their admin accounts to log into SQL and create sp, views and databases.When the user leaves I am stuck with all these objects that are owned by somone no longer working for the company. So my question to you guys is: What is the best practice to use in creating new objects? Thanks for your guru-ness!
Hi. We have developed as quite simple ASP.Net webpage that fetches a number of information from a SQL 2005 database. We are having some problems though, becuase of a firewall that is beetween the webserver and the SQL server, and I think this is because of bad code from my part. I'm not that experiensed yet, so I'm sure that there is much to learn. Usualy when I do a query against a SQL database, I do something like this: Function GO_FormatRecordBy(ByVal intRecordBy As Integer) Dim dbQueryString As String Dim dbCommand As OleDbCommand Dim dbQueryResult As OleDbDataReader dbQueryString = "SELECT Name FROM tblRegistrators WHERE tblRegistratorsID = '" & intRecordBy & "'" dbCommand = New OleDbCommand(dbQueryString, dbConn) dbConn.Open() dbQueryResult = dbCommand.ExecuteReader(CommandBehavior.CloseConnection) dbQueryResult.Read() dbConn.Close() dbCommand = Nothing Return dbQueryResult("Name") End Function Now, lets say that I have a DataList that I populate with Integer values, and I want to "resolve" the from another table, then i do a function like the one above. I guess that this means that I open and close quite alot of connections against the database server when I have a large tabel. Is there any better way of doing this? Chould one open a database connection globaly in lets say the ASA fil? Whould that be a better aproch? When I added the CommandBehavior.CloseConnection to the ExecuteReader statment, I noticed that it was a bit faster, and I think there was fewer connections in the database, so maby there is more to the "closing connections" then I usualy do. Any tips on this? Best reagrds,Johan Christensson
I recently started developing a web site for a client using storefront.net and ms sql server.
the db schema of storefront.net has autonumbers as the PKs for the products table (even though the products table contains an additional field for product_number.)
So here's my dilemma if you care to read:
I typically develop local, deploy remote (after testing). I have a local SQL server, and then the remote SQL server.
When I'm developing for this project, I'll insert data such as products to the products table (sometimes several times while i'm working out routines to import data to the products table.) this has the effect of creating a unique ID for each product based upon SQL auto-incrementing INTs.
This StoreFront.net (SF.NET) has another table that is a lookup table. For each part number, it has a corresponding categoryID number.
Now, if i have product_ID 1234, and I set the category ID to say 10 and get it working on my local box, every thing is fine.
Here's where the problem comes in: When I use DTS to transfer the database during remote deployment, each product is inserted into the remote DBs products table and gets a NEW product id. Same with the categories.
This has the effect of breaking the relationships. (SF.NET has no ref integrity nor relationships defined in the db.) let's say my product_id 1234 gets put into the remote copy, it'll get a new product_ID (PK). let's say it's now 5775. now my category ID will also get a new value. so my data is now not related.
I don't know how to handle this situation. The unique IDs generated on my local sql will nearly almost always be different from those generated on the remote db.
How do i handle this situatoin is my question? advice, guys?
Hi to everyone! I've to create a little auction system that runs on web. Before starting developing, I'll would like to be sure to use the best practice...The main aspect is to avoid conflicts on database updating with bids, i.e. if a user places his bid I've to be absolutely sure that his bid is the highest at the moment of updating database. If not, I have to refuse it...So I ask you: using transaction is the best way for assuring the non-conflicts? And may I have to be careful of some other aspect in ASP.NET pages? Or there's no problem of conflicts at page level?Thank you very much in advance for any suggestion, and If anyone has some other thing to say about possible problem on auctions I'll be glad to hear him!!!! ;-)
I have around 10 databases currently residing on different platforms which make-up for roughly a terrabyte of information. I would like to migrate all of these DBs over so that they are all managed under one instance of SQL server 2K. In my view this streamlines things a lot and reduces costs of licensing/hardware.
However, is managing all of these databases on one clustered instance of SQL 2k the best approach from a performace stand point? Would it be better to seperate each database onto its own machine? I am under the impression that given enough hardware (processors, RAM) using just one instance of SQL 2k enterprise should be enough to perform the mangement of this data. Is this correct? Is there an optimal model?
Money is always a concern but in this case, performance is the main objective. The size of the data managed will be growing significantly so the system should be scalable.
My background is as a developer so I may not have provided enough to give a good answer. Please ask questions if you need more detail. I am looking for suggestions on the best way to handle this.
Specifically I would like to know the preferred architecture as well as any suggested hardware.
I'm building a database that has maybe four unique tables Student, Advertiser, Employee, maybe Account. Three of the four table (Student, Advertiser, Employee) have something in common in which they all contain fields such as emailAddress, password, role, isAccountActive, etc. which allow them to access their respected data. However, is it best practice to build a fourth table which contain Account information or should I just include that information in their respected tables?
My thinking is that if you have a fourth table such as Account then you can manage all accounts (Student, Advertiser, Employee) from one table, but as the database gets more in-depth you have to build more and more complex stored procedure to do simply task such as update, delete, select, etc.
Hello All .. This is the scenario I'm having : -- I'm a beginner so bear the way I'm putting it ... sorry !
* I have a database with tables - company: CompanyID, CompanyName - Person: PersonID, PersonName, CompanyID (fk) - Supplier: SupplierID, SupplierCode, SupplierName, CompanyID (fk)
In the Stored Procedures associated (insertCompany, insertPerson, insertSupplier), I want to check the existance of SupplierID .. which should be the 'Output' ...
There could be different ways to do it like: 1) - In the supplier stored procedure I can read the ID (SELECT) and :
if it exists (I save the existing SupplierID - to 'return' it at the end). if it doesn't (I insert the Company, the Person and save the new SupplierID - to 'return' it at the end) ------------------------------------ 2) - Other way is by doing multiple stored procedures, . one SP that checks, . another SP that do inserts . and a main SP that calls the check SP and gets the values and base the results according to conditions (if - else)
3) it could be done (maybe) using Functions in SQL SERVER...
There should be some reasons why I need to go for one of the methods or another method ! I want to know the best practice for this scenario in terms of performance and other issues - consider a similar big scenario ..... !!!
I'll appreciate your help ... Thanks in Advance . ! .
Need the following question addressed, as it keeps coming up in our development meetings and has been creating a divide. Pease voice your opinion.
To keep it simple, we have Table1 which identifies several questions that are revised on a regular basis. One of it's columns is called "Revision Status". Within revision Status, we would like to identify the possible status of a question such as:
New Questions; Revised; Resubmit; Inactive; Active;
...as well as several more.
I'm of the mind to have these in a seperate table identified with a unique ID... call it StatusTable.
However others feel, just use the "Revision Status" column and simply use the numbers "WITHOUT" a table or description. The developer documentation will tell the developer which number equals the description. ie the following would be found in the Revision Status column.
1 2 3 4 5
My mind says the above is ilogical. I would rather join and say in my statement:
We are building a new site using ASP and SQL as the backend. Any idea where I can find some information about best practices to coonect the 2. I was thinking about IIS on a DMZ with port 80 only open and the SQL inside the internal network and open port 1443 between them.