I need to validate my input rows. The row is valid if there exist some other input rows in the same table (I am importing data from excel and access). I'll give an example to make everything clear:
Input table boys has following columns:First_Name ,Surname and Date_of_birth.
Output table is Twin_Triple_More_Brothers. I would like to insert into this table only boys that surnames are equal and difference in date of birth is less then one day.
I was thinking about lookup component, but I cannot use it in that way (or I just do not know how).
Maybe someone has an idea how to do this?
Thanks for help.
I am using the lookup transform with the following settings:
Reference table: Use results of an SQL query. The query retrieves the surrogate key and four business key columns from a dimension table which contains a few thousand rows.
Columns: business keys in the incoming data are mapped to the business keys in the reference table, and the surrogate key is looked up from the reference table.
Advanced: Enable memory restriction is OFF (and the other items on the Advanced tab are greyed out).
I assumed that this means that the lookup transform would cache all of the rows in the SQL query, and then perform the lookups against this cache. This is the behaviour that I saw when I was running the package in my local environment in the BIDS debugger.
However, a colleague was doing some profiling on our production database server, and noticed that the lookup transform is instead issuing a single SQL query for each row of input. Our production ETL server has many GBs of free RAM available (way more than enough to cache the contents of the lookup table in memory), and given that memory restriction is disabled, I don't understand why the lookup transform is behaving in this fashion. Does anyone have an explanation for this? I'm probably misunderstanding a key concept here.
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute Taking a snapshot of the reference table Taking a snapshot of the reference table Building Fuzzy Match Index component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record. Can you still use the lookup function with more than one field to compare against? So for example
Department \___SalesPerson \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged:
I have a Conditional Split with 3 outputs. On the first output I have a lookup, when I execute the package I have 56 rows going through the Conditional Split, all rows are then going to the 2nd and 3rd output but the lookup on the first output generates an error "Row yielded no match during lookup".
I don't understand why the lookup is generating an error while there is no row going through it.
I am designing a ssis package,This is intends to mine text data(Data extracted from websites). Term lookup/Term extraction has been used as tools for mining. I have lookup terms defined with me for reference table,but the main problem lie in extracting the nearby text/number/charcters to these lookup terms during mining. For example : I found noun "Email" 200 (frequency score) times in my text,Now I want to extract nearby email address(this is also true for PhoneNumber,Address attributes also).so how can I achieve this with SSIS. If u have some idea/suggestion to carry out this challenge with or without Term Extraction/Term Lookup,plz do write here.
Table is created. I can SELECT * FROM accepted and see my column names.
Then I try to BCP into the table using:
C:>BCP sales..east in C:Resultsaccepted.txt -t -f C:cpformataccepted.fmt -Usa -Ppwd
I get this error:
Starting copy... SQLState = 22001, NativeError = 0 Error = [Microsoft][ODBC SQL Server Driver]String data, right truncation SQLState = 22001, NativeError = 0 Error = [Microsoft][ODBC SQL Server Driver]String data, right truncation
and so on......
In the .fmt file I've tried "", " ", " " and everthing I could think of as a delimiter. Still no luck. I've tried almost every switch available to both OSQL and BCP. The data I am trying to BCP is a SQL result so I don't think any special delimiters are placed. I've tried not using the .fmt file and using the prompts but still no luck. Data is CAST in the query and doesn't excede 45 characters. Hope I've explained my problem well enough.
I want to lookup values from a database into another database both of which are in the same sql server 2000. One databases is called GamingCommissiondb the other is called LicensingActions I need some of the tables to communicate with each other, to look values from one another. Example I need the Termination table to look up values from the Revocations table. Would using LinkedServers suffice??
Hi, Can anybody provide me a Lookup UDF? I need to supply columnname,Tablename and condition dynamically and I need the scalar value in return. Any help will be greatly appreciated...
Hi, I am transferring the xml data from an xml file into sql server table using ssis. To avoid any duplicate import via this ssis package, I would like to first check if data exists in the sql server table for what is about to be imported. If so then delete the existing data and then import.
Question: How do I get the field value say ID field from the file and then take this id and delete these from the table in sql server first.
Is this to do with lookup or is there an easier way to do this please? thanks
I am trying to use lookup to see if a item esists in my table ( 3 key fields ). If the lookup fails I want to insert the records. If it succeeds I have put a recordcount to catch the items that are not required. I don't think that I understand the settings for failed rows. I have tried setting the Configure Error Output to redirect, but this does not seem to work. I have the below errors.
[SQL Server Destination [151]] Error: Unable to prepare the SSIS bulk insert for data insertion.
[DTS.Pipeline] Error: component "SQL Server Destination" (151) failed the pre-execute phase and returned error code 0xC0202071.
Can someone please advise me how to set up this component to work for my application
I am using SCD and Lookup for Incremantal ETL load. But it is very interesting that my 3000 recs is becoming 2500 when I put SCD and lookup. if I replace dummy multicast then it is 3000.
Before inserting records of a recordset I would like to check that each one of these records doesn't exist in the destination table 1. If the records exist, they should be redirect to other table 2. For this purpose I use a lookup task. The referende table is the destination table 1. The green line is linked to table 2 and the red line is linked to table 1. The first execution works properly and copies de records to table 1, but on the next executions the records are copied to table 1 instead of being copied to table 2.
Hi,I need to input Chinese character into the table of the database. I did try to install/run both Chinese/English version of Visual Studio into Chinese/English version of Server 2003 but it still didn't work.Please help !stephen
I have two textboxes on the page...shipdate and duedate. When the page loads, shipdate has today's date loaded and duedate has a date that's 28 days later than today. When I change the dates and submit it's not updating in the database instead I'm getting the two dates in pageload. What am I doing wrong?Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
ShipDateTxt.Text = Today() DueDateTxt.Text = DateAdd(DateInterval.Day, 28, Today()) End SubProtected Sub LoanRequest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles LoanRequest.Click Dim conn As New Data.SqlClient.SqlConnection(ConfigurationManager.ConnectionStrings("TrainUserConnectionString").ConnectionString)Dim cmd As New Data.SqlClient.SqlCommandWith cmd .Connection = conn .CommandType = Data.CommandType.StoredProcedure .CommandText = "UpdateloanerInfo".Parameters.AddWithValue("@requestorid", Integer.Parse(Request.QueryString("requestorid"))) .Parameters.AddWithValue("@shipdate", ShipDateTxt.Text).Parameters.AddWithValue("@duedate", DueDateTxt.Text) End With Here's updateloanerinfo stored procedure: @requestorid int,@shipdate datetime,@duedate datetime AS update LibraryRequest set [shipdate] = @shipdate,[duedate] = @duedate Where requestorid=@requestorid
Here is what I am trying to do: Every month we need to import a fixed format text file into one of our tables. The format and location of the file is same every month except for the name. I want to create a DTS package to import it and call this DTS package first thing in a stored procedure(after which I do some processing with this imported data). I want to create the filename in my stored procedure and then call this DTS package to import it.
I am usig DTS as the interface is so much easier and want to avoid bcp :-)
I am just learning SQL Server and I am stuck with a few things. I was wondering if you can define input masks for columns in the database in SQL Server. I have used this feature many times in Access and on the front end in Oracle Forms. Is there such a thing in SQL Server?? Is it just called something different.....? I will be trying to format telephone numbers and dates. Thank you for you time.
(Select distinct t.itemnmbr,t.totalprice as totalcost
from ( select distinct
vs.itemnmbr, sum(vs.totalsumprce) as totalprice
from vwquantityprice vs
Where vs.itemnmbr = @itemnmbr and (vs.docdate between @startdate and @enddate)
group by vs.itemnmbr
) as t
)
select * from quantityprice('06-5840','4/1/2007','4/1/2008')
well, guys, i have this function and obviously i will get one row from this in output...with that particular itemnumber but i want multi-itemnmbrs some times in output...some times 2..soemtimes 3...
example, select * from quantityprice('06-5840,ab-4581,0a-1458,45-0945','4/1/2007','4/1/2008')
can you tell me what condition it will come to get this output..
like in where itemnmbr in('06-5840,ab-4581,0a-1458,45-0945') and in starting input v ariable @itemnmbr..
create function quantprice
( @itemnmbr varchar(50)..
don't know what condition it will come and where i have to put condition in where clause or in input variable..
any help would be appreciated..thanks a lot!! guys plz reply.
Hi there, I'll be going through a training couse for SQL Server next month. The class I'm signed up for is M2780: Maintaining a Microsoft® SQL Server™ 2005. My company uses SQL Server 2000 now and unless I give good reasons to upgrade to 2005, we won't be upgrading. I have very little DBA experience, but since I was told SQL Server is my responsibility, I figured it would make sense to go through some type of training for it. My questions are: Is there enough the same between the two that it makes sense for me to take this class? Are there good business reasons I can give for upgrading? If we don't upgrade is it worth taking a class for 2000 vs 2005?
I’m trying to pass inputs from one stored proc to another but I’m having problems in passing them.
First proc (input_passing) should pass the following inputs – FirstName, LastNmae and RecordID to the second proc (input_receive) – then the second proc will diplay the outcome – YES/NO if there is a match. ------------------------------------------------------------------------------------------ First proc (input_passing) ------------------------------------------------------------------------------------------------------- DECLARE @RecordIDVARCHAR(11), @FirstNameVARCHAR(60), @LastNameVARCHAR(60)
--DECLARE tbInputs CURSOR FOR SELECT RecordID, FirstName, LastName
FROM tbInputs
OPEN tbInputs
FETCH NEXT FROM tbInputs INTO @RecordID, @FirstName, @LastName
WHILE @@FETCH_STATUS = 0 BEGIN --GET RECORD SET @RecordID = RecordID from dbo.tbInputs)
END
FETCH NEXT FROM tbInputs INTO @RecordID, @FirstName, @LastName
END
CLOSE tbInputs DEALLOCATE tbInputs exec input_receive ----------------------------------------------------------------------------------------------------------------- Second proc. (input_receive)
-- ******* Initializations ********** SET @Count = 0 SET @Found = 0 -- DEFAULT to No File Match SET RecordMatch = 0 SET @MatchedOn = 'NO FOUND' -- ******* END Initializations **********
-- First check to see if the RecordID Matches IF( @RecordID IS NOT NULL AND @RecordID <> '' ) BEGIN IF( CAST( REPLACE(@RecordID, '-', '') AS DECIMAL(18,0)) > 0 ) BEGIN SELECT @Count = COUNT(*) FROM tbAlert WHERE REPLACE(RecordID, '-', '') = REPLACE(@RecordID, '-', '')
IF( @Count > 0 ) BEGIN SET @RecordMatch = 1 SET @MatchedOn = 'FOUND : RecordID' RETURN END END END
A script component receives some input. But I just can't get at the first row??
Basically, if i use the NextRow method in the in the Do statement, then it advances the row collection to the second row before it gets into the code inside the loop?? BUT, if I use the EndOfRowset property to define my loop then I get an error:
[PipelineBuffer has encountered an invalid row index value]
I'm guessing this means...I have to call NextRow before i access the data in the collection? But thats retarted because then I miss the first row?? what? What am I missing??
This is the code which works but I miss the first row:
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Dim strConcept As String
Do While Row.NextRow()
strConcept = Row.concept
updateDb(strConcept)
Loop End Sub
This is the code which throws the invalid row index error:
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Dim strConcept As String
Do While Not Row.EndOfRowSet()
strConcept = Row.concept
updateDb(strConcept)
Row.NextRow()
Loop End Sub
I've put some try catches in there an the error happens on the line which calls Row.concept....?
Can anyone help, it must be something I'm messing up
If a component requires a sorted input it would seem reasonable that you can check the IsSorted property of the attached input, but this will always return false. I have tried this when connecting the output of the Sort transform to my component, and then check the IsSorted property for this input. It is always false. How can this be, and also how can I see if the path is indeed sorted?
If using a virtual input column in my UI, I get a SortKeyPosition on the columns, but when overriding SetUSageType in the component class I always get zero for the key. Why is the sort information not quite there for me?
I'm very new to SQL Server so please forgive me if this question is ridiculously simple. I have to upgrade the report engine from one that was used in a legacy VB6 app to a C#.net app. In doing so, I'm looking at assorted reports that came out of the old app. Here is the SQL code for one of them:
SELECT (LTRIM(STR(From_To,10,0)) + '-' + LTRIM(STR(From_To + 49,10,0))) AS Bonus_Earned, COUNT (Empl_ID) AS Men, round(SUM (SumDollars),2) AS Group_Earn, round((SUM (SumDollars) / COUNT (Empl_ID)),2) AS Calc0 FROM (SELECT CONVERT (int, round (SumDollars / 50, 2)) * 50 AS From_To, SumDollars, Empl_ID FROM (SELECT round(SUM (Actual_Hours_Dollars.Incentivedollars * Contract_History.Bonus_Pct / 100), 2) AS SumDollars, employees.Empl_ID FROM ((Employees INNER JOIN Actual_hours_dollars ON Employees.Empl_ID = Actual_Hours_Dollars.Empl_ID_R) INNER JOIN Contracts ON Actual_Hours_Dollars.Contract_ID = Contracts.Contract_No) INNER JOIN Contract_History ON Contracts.Contract_Idx = Contract_History.Contract_Idx_R where employees.empl_idx = (SELECT max(empl_idx) FROM Employees AS OuterEmployees WHERE OuterEmployees.empl_id = employees.empl_id AND outeremployees.Datex = (SELECT max(datex) FROM Employees AS InnerEmployees WHERE InnerEmployees.empl_id = employees.empl_id AND Inneremployees.disabled = 0 AND Inneremployees.Datex < '~EndDate~')) AND Contracts.Datex = (SELECT max(datex) FROM Contracts AS InnerContracts WHERE InnerContracts.contract_no = Contracts.contract_no AND InnerContracts.disabled = 0 and InnerContracts.deleted = 0 AND InnerContracts.Datex < '~EndDate~') AND Actual_hours_dollars.Datex >= '~StartDate~' AND Actual_Hours_Dollars.Datex < '~EndDate~' AND dateadd(month, Contract_History.Monthx - 1, dateadd(year, Contract_History.Yearx - 1900, '01 Jan 1900')) >= '~StartDate~' AND dateadd(month, Contract_History.Monthx - 1, dateadd(year, Contract_History.Yearx - 1900, '01 Jan 1900')) < '~EndDate~' and Actual_Hours_Dollars.IncentiveHours <> 0 GROUP BY employees.empl_ID) AS InnerRS1 GROUP BY ROUND (SumDollars * 2, -2) /2, SumDollars, Empl_ID) AS InnerRS2 GROUP BY From_To
I'm only including it for completeness. The key thing I'd like to draw your attention to are two variables that are clearly input parameters: ~StartDate~ and ~EndDate~.
My question is this: If I want to copy this code into SQL Query Analyzer and run it to see what kind of results I get back, what's the simplest way to define these two input parameters? I'm hoping you could just show me the syntax to define them above the SELECT statement.
So I have a person who is adamant in tell me that SQL Server does not run on windows XP.
Now, I have already done all the research on this (i.e. sql server 2000 product page / requirements) and know the answer, but they insist on asking the question, so here it is .....
'Will SQL Server run on Windows XP'
A simple YES or NO will suffice; however, if you want to explain the answer (if it requires one ;) ), please feel free.