I have to load my fact table using data from my stage table and joining it against dimesnsions, most of my dimesions are straight joins that I can implement using Lookups , howevere in one join I am using something like this
SELECT T.Dim_Time_ID ,
SUM (Measure)
FROM [dbo].[Stage_Table] S
INNER JOIN Dim_Time T
ON SUBSTRING(T.MonthYear,1,3) = SUBSTRING(S.Time,1,3)
AND SUBSTRING(T.MonthYear,6,2) = SUBSTRING(S.Time,6,2)
Currently I am using a copy column transformation to make a copy of this column, and then using substring function in my SQL code and the Lookup Transformation joining the data to get the desired output, was wondering if there is any other (better) way of accomplishing this inside the data flow.
I have delta loaded all the dimension tables now and each dimension table is related to fact table through a surrogate key, How do i further load a fact table. Please tell me I am stuck up here.. :( .
If any one has an example to refer please do tell me
Hi I am strugglinh since last 2 days .SSIS is giving me torrid time
I am getting error while loadding the fact table
[Destination Fact Table [1099]] Error: An OLE DB error has occurred. Error code: 0x80040E21. An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80040E21 Description: "Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.". [Destination Fact Table [1099]] Error: The "input "OLE DB Destination Input" (1112)" failed because error code 0xC020907B occurred, and the error row disposition on "input "OLE DB Destination Input" (1112)" specifies failure on error. An error occurred on the specified object of the specified component. [DTS.Pipeline] Error: The ProcessInput method on component "Destination Fact Table" (1099) failed with error code 0xC0209029. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
Please help
I have already googled for this error and appied whatever tips were given.
I'm loading a fact table that has several geographic attributes - some are at the state level, some are at the county level, and then some are drilled farther in that that. I understand the basic concept of the dimension with the ragged hierarchy, but unsure of how to load to the fact table using lookups based on these geographic units. For example, if my geographic dimension contains 200 records for the state of Wyoming, basically a record for each fine-grain place (i.e. city/town), then how do I go about doing a county lookup. Wyoming only has 23 counties, but because of the repetitive nature of the dimension attributes that are not at the finest grain, I'll get more records in the lookup than I need. This activity repeats of course while I move up the geographic scale to state, then country. How do I configure/fill my dimension to handle these differing scales of data?
Maybe someone here can help me out: I have a Kimball type II dimension, where i track changes in a hierarchy. Each row has a RowStartDate and RowEndDate property to indicate from when to when a certain row should be used.
Now i want to load facts to that table. So each fact will have a certain date associated with it that i can use to lookup the right Id (a certain SourceId can have mulitiple integer Ids when there are historic changes) and then load the facts.
Is there a building block I can use for that? I could do this with SQL scripts but the client would prefer to have as much as possible done in SSIS. The Lookup transformation will only let me specify an equal (inner join where A=B) join, but i need equal for one column (SourceId) and then >= and <= (RowStart and RowEnd) to find the right row version.
I have developed some packages to load data into "Fact" tables in the data warehouse. Some packages are OK, other ones not. What is the problem?: some packages load fact tables with lots of "Lookup - Data Flow Transformation" into the "data flow task" (lookup against dimension tables) but they are very very slow, too much slow to be choosen as a solution.
Do you have any other solutions to avoid using "Lookup - Data Flow Transformation"? Any other solution (SSIS, TSQL and so on....) is welcome to speed up the Fact table loading process.
Say you have a fact table with a few columns that all reference the same key column in a dimension table, you want to write a view to return the information for those keys?
USE MyTestDB; GO SET NOCOUNT ON; IF OBJECT_ID ('dbo.FactTemp' ,'U') IS NOT NULL DROP TABLE dbo.FactTemp;
[Code] ....
I'm using very small data at the moment, and the query plan and statistics don't really say which way.
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute Taking a snapshot of the reference table Taking a snapshot of the reference table Building Fuzzy Match Index component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record. Can you still use the lookup function with more than one field to compare against? So for example
Department \___SalesPerson \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged:
I have a Conditional Split with 3 outputs. On the first output I have a lookup, when I execute the package I have 56 rows going through the Conditional Split, all rows are then going to the 2nd and 3rd output but the lookup on the first output generates an error "Row yielded no match during lookup".
I don't understand why the lookup is generating an error while there is no row going through it.
I am designing a ssis package,This is intends to mine text data(Data extracted from websites). Term lookup/Term extraction has been used as tools for mining. I have lookup terms defined with me for reference table,but the main problem lie in extracting the nearby text/number/charcters to these lookup terms during mining. For example : I found noun "Email" 200 (frequency score) times in my text,Now I want to extract nearby email address(this is also true for PhoneNumber,Address attributes also).so how can I achieve this with SSIS. If u have some idea/suggestion to carry out this challenge with or without Term Extraction/Term Lookup,plz do write here.
I would like to know how to use a fact table so that when I insert or update a row with a word that the table will reference the fact table to make sure that the word I'm using is correct.
for example I have a table with column Fulltext and Abbreviation in the fulltext column I have a a word "Windows Server 2008" now in the abbreviation I would like to abbreviate this to "Win Srv 08" Now the Fact table would have to columns Fulltext and Abbreviation under Full text the full words would be in it like Windows, Server, and 2008 and under the Abbreviation column Win, srv, and 08
So I want it so that everytime the word Windows comes up and I need to type an Abbreviation for it that it will reference the fact table which is using the Abbreviation Win. To avoid different ways of abbreviating the word windows.
Is there a way to do this automatically so that I don't have to manually go back and forth between the fact table and the table that I'm updating?
Is there functionality in SQL 2005 to update only new records, e.g. only records from yesterday. I've seen functionality for dimension tables to only get new records but nothing for fact tables.
Hi I have a fact table in which i have a String column now i want to show it in the mdx queries..when i add it as a measure it shows some numerical values in the cube i cannot even add it as a member propeties due to some datwarehouse design constraints..so can anyone out there help me please....it urgent Thankx in advance regards Hemant
I am writing a BI solution for a recruitment company. In their business, the can be n number of participants from different dimensions linked to the same fact record. For example, a client can be sent the CV of 50 candidates. That's my first problem. My second problem is the variety of dimension participant types for a given fact record. This results in the need for nullable dimension FK's - which I'm trying to avoid. For example, consider the following two business events. In the first one, a candidate fills a job. Easy, we have a record in the fact table where the fact table has the following columns: DateKey, EventType, CandidateKey, VacancyKey. No nullable columns, great. But there are other events that I want to store in the fact table too. Let's go back to my first example: The client is sent CV's of 50 candidates in one transaction. So there is one client linked to the fact, but 50 candidates. So now I need to extend the fact table and add another column: CandidateGroupKey (which links to and Intermediate Fact Table). But in this case there was no vacancy involved. So do I now have to make the VacancyKey column nullable? That doesn't seem like a good idea... Or do I have to go for a completely different approach and have different fact tables instead of just one?
Now I have to populate the fact & dimension tables by writing sql scripts. Now I have already populated the dimension tables by writing sql script, But I have to populate the fact table taking into account, here I am facing problem in wriring sql script
(i) unit_price is taken from the book base table with reference to the isbn (ii) sales_quantity is taken from the order_detail.quantity with reference to table cust_order(via orderid & orderdate) (iii) discount_price is determined dependent on the quantity. if the quantity > 20 then discount 20 %(i.e discount_price = 0.8 * unit_price). if quantity < 10, no discount i.e normal price. if quantity between 10 and 20, discount 10%. Note that the quantity is determined based on each order of each customer, thus if the same book appears at multiple positions in an order, those positions shall be grouped together. This could happen because the pk of the order_detail table is order_id + item no, not order_id + isbn (iv) sales_amount is sales_quantity * discount_price
I have an existing table that i would like to add a uniquidentifier toeach record of the table. I have already create a column for theuniqueid. What sql script could I run to actually place a value forthe newly created column for each record?thanks for your help ahead of time
Hi,this is easy with OLAP tools, but I need to do it just with MS-SQLserver:fatTableyeartypeval97a197b297c398a498b598c6....yeartype_atype_btype_c971239845699...The problem is number of different types - not just 3 like a,b,c butmore than 100, so I don't want to do it manually likeselectyear, a.val, b.val, c.valfrom(select year, val from factTable where type='a') afull join (select year, val from factTable where type='b') bon a.year = b.yearfull join (select year, val from factTable where type='c') con a.year = c.yearis it possible somehow with DTS or otherwise? I just need to presentthe data in spreadsheet in more readable form, but I cannot find anyway how to export the result from MS-SQLserverOLAPservices to Excel...Martin
l've a fact table DEVICE with following structure,
DEVICE_NAME VARCHAR(50) DEVICE_DATE DATETIME DEVICE_NUMBER INT Where DEVICE_NAME and DEVICE_DATE form a PRIMARY KEY
So l would like to import a text file with same information into this table.
My problem is, text file contains records which will violate my primary key constraint. In that case, l would only insert the record with DEVICE_NUMER not equal to ZERO and discard and log the others.
In case of the records violtae primary key constraints have DEVICE_NUMBER not equal to ZERO, discard both and log it.
I am relatively new to SSIS/SSAS. I have searched the forums but cannot find an answer to my question.
I created a cube in SSAS and have deployed it. Now I am trying to use SSIS to populate the cube. I have setup a DS that points to the SSAS instance - it uses OLEDB Provider for Analysis services 9.0.
When I try to use a data flow task OLE DB source to truncate the dimension/cubes I do not see the DS in the list to select?
I am finding it hard to get into the SSIS way of organizing the processing.
I am new to database design and a lot of things never made any sense to me regarding relationships and such. I have been working on a very large design that started out well enough, but as tables were added a lot of organization fell by the wayside. Now that I am getting closer to the end, I am finding a lot of places where there should be Foreign keys, maybe some triggers, etc (I have the same data item in 5 different places, when it is deleted in one place it must go from all). Assuming that the datatypes and sizes are identical for the duplicated bits of data, can I go about making FK-PK relationships and such now that there is a lot of stuff in the database, or do I have to start from scratch and rebuild the whole thing.
The other question is much more simple:
How do I make multiple rows "unique". I have a primary key, and an identity column, but I can't add a secong primary key, and Enterprise Manager only lets me make 'int' datatypes identity columns. I have tried the "add constraints" but it asks for an expression and I have no idea what the syntax might be.
I have a csv file of quotes. I need to IMPORT the quotes into my SQL Server database. I created the table with an autonumber identity column, but the IMPORT fails because I can't INSERT NULL into my identity column. There is no option to autonumber or anything when I click the TRANSFORM button on the IMPORT wizard. So, I have decided to allow NULLS on the ID column and add the numbering later. I'd rather not have to manually number it, is there a way to do it with a query? I treid just setting it to not allow NULLS and to auto-incrememnt, but that didn't work....
hello, Iam trying to build OLAP cubes in MS SQL Server 2000.But all the tutorials/docs mention about fact tables & dimensions. Can I get some good tutorials on how to create fact tables to build OLAP cube ? Also, which OLEDB provider to be used for MS SQL Server while creating OLAP Datasource ?
Thanks in advance & wishing u a prosperous new year too.
I am building a health care application that marries transaction-level data (health care services provided) with person-level characteristics that have a time-dimension. The person-level characteristics are diseases that the person has (these disease all have a start and some have an end date). The diseases are stored in a table in which the foreign keys are a person-identifier, a time identifier (month/year) and a surrogate for the disease. Persons can have more than one disease at a time (the diseases are NOT mutually-exclusive). There are no measures in this table. The transaction table has a foreign key for person and time (month/day/year), a procedure code (the type of service rendered) and money (the cost of the services).
How do I answer the following questions:
What is the total cost of care (the sum of all service costs) last year for persons with "disease A"?
What is the total cost of care last year for persons with "disease A" AND "disease B"?
What is the total cost of care last year for persons with "disease A" OR "disease B"?
I've tried a factless fact table but can't get it to work. If anyone has the right solution and can communicate to me before I slit my wrists, I would be greatly appreciative!!!