I got a strange problem with SQL Server 2005 data mining models though. I have selected the input columns for my mining model (which are different from the input columns for its mining structure, since I ignored some of the columns for the selected model). But the mining model still used all input columns from the mining structure rather than those I chose for the mining model.
Would please any one here give me any guidance and advices for that. Really need help for that.
I am wondering where can I store my mining results in data mining engine? For example, I got mining results like accuracy chart, decision trees, and other formats of results based on different mining algorithms I used for my data mining, so where can I actually store the results for reporting service use later? Is it possible to do that in SQL Server 2005?
Thanks a lot for any help and guidance in advance.
Hi, I have just run a simple data set through a model to predict a simple true or false value (i.e. binary output) The Lift Chart/Mining Legend in Analysis Services shows three results €“ Score, Population Correct (%), and Predict Probability (%)
Population Correct I beleive is the percentage of predictions it got right out of the total number of predictions it tried to make. Is this correct?
However, I can€™t work out how the other two are derived in particular the 'SCORE'. To give a live example the scores were as follows:
Model Score Pop Correct Pred Probability Decision Trees 0.83 76.59% 54.28% Neural Network 0.75 67.63% 50.05% Ideal Model 100.00%
Can anyone help with this and give a detailed explanation?
I just found that I am not able to view the accuracy chart for my mining model. The error message is: no mining models are selected for comparision. Which is quite strange.
While recently working with several mining models, I came across something that struck me as pretty odd - and I'm hoping to find an explanation for the behavior.
Consider the following setup:
A single table in the relational database represents the only case table A single, continuous column is the predictable A mining structure has been created
The mining structure contains a single model, based on the MS Decision Trees algorithm Input columns were selected for the model via the BI Studio wizard (i.e., those provided via the "Suggest" button) The structure has been fully processed Now, the interesting parts:
I view the scatterplot for the mining model, under the Mining Accuracy Chart tab Back on the Mining Structure tab, I delete one of the input columns I add the same column back into the structure The structure is fully processed again When I view the scatterplot for the mining model, under the Mining Accuracy Chart tab, a different set of data points are presented for the model predictions A different set of decision trees under the Mining Model Viewer tab confirms thisHow could different patterns have been found this second time around, even though all of the input columns were the same (as well as the training cases)?
(Note: I encountered this situation while creating a new mining model that was identical to an existing one. Even though the models received the exact same inputs and training cases, they yielded different results. I was able to reproduce the behavior by using steps 1-6 above, though.)
Can someone provide some insight on this behavior, or some kind of explanation of what may be happening?
I've created models with Decision Tree and Neural Network algorithms that predict continous target. But I don't know how to interpret scores that occure under scatter accuracy plot. How should I interpret scores under scatter accuracy plot? How can I estimate occuracy of model created with Time Series? How can I compare accuracy of model created with Time Series with models created with Decision Trees and Neural Network algorithms?
Just want to make things perfectly work and make the most of our fantastic SQL Server 2005 Data Mining Engine. Can any of you here give me some super advices on the validation of the mining models. As we always see, the 3 aspects of a mining model are: Score, Population correct, and Predict Probability. So the question is: How can we combine these three aspects to best judge the mining models by being able to tell which model is the best one? And to what extent can we really trust these mining models?
These are very important before we can actually bring the models into work to convince other people who have no ideas what are going on with these models. Yes, we just want to convince them with the results of these models and make the most of them and best help them getting the most from their business operations etc.
By the way please can you explain a bit details on each of these aspects? Thanks again.
I am looking forward to hearing from you shortly and thanks bunch for your help.
I am having a question about automating data mining models managements. As we know in many businesses, patterns vary very frequently, therefore, the mining models created will need to be created again afterwards according to new rules appearing in the data. But can we make all these process automated like automatically assessing the mining model accuracy and automatically recreate the mining models based on predifined specifications? Would please any one here give me any idea about that?
Again I am confused about the extent of being convinced by the mining models. We can validate the models with accuracy chart, but then to what extent can we trust that? (you never get 100% correct miming models) If we dont trust the results of the models, then the patterns the models discover are meaningless.
Just need some advices from you experts here to help me convince people on what I got from my mining models.
I am looking forward to hearing from you shortly and thank you very much.
I am in the process of creating an Integration Services package to automate the process of training mining models and getting predictions. Until recently, I have been processing the models directly from Business Intelligence Studio without any problems. However, when I try to use the exact same training set as an input to the Data Mining Model Training destination, I get several errors. Here is the output:
[Mining Models [1]] Error: Parser: An error occurred during pipeline processing. [Mining Models [1]] Error: Errors in the OLAP storage engine: The process operation ended because the number of errors encountered during processing reached the defined limit of allowable errors for the operation. [Mining Models [1]] Error: Errors in the OLAP storage engine: An error occurred while the 'CPT MODIFIER' attribute of the 'BCCA DMS ~MC-CLAIM LIN~5' dimension from the 'BCCA LRG DMS TEST' database was being processed. [Mining Models [1]] Error: File system error: The record ID is incorrect. Physical file: . Logical file: . [Mining Models [1]] Error: Errors in the OLAP storage engine: The process operation ended because the number of errors encountered during processing reached the defined limit of allowable errors for the operation. [Mining Models [1]] Error: Errors in the OLAP storage engine: An error occurred while the 'BILL TYPE' attribute of the 'BCCA DMS ~MC-CLAIM LIN~5' dimension from the 'BCCA LRG DMS TEST' database was being processed. [Mining Models [1]] Error: File system error: The record ID is incorrect. Physical file: . Logical file: . [DTS.Pipeline] Error: The ProcessInput method on component "Mining Models" (1) failed with error code 0x80004005. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
I have not been able to find an answer as to why this is happening. I found a post regarding a similar problem with processing an OLAP cube in SSIS, but it seems that the author of that post never found an answer. Has anyone else here seen similar errors when processing mining models from Integration Services?
Also, if I process the mining models manually then try to run only predictions in SSIS, I get many of the same errors. I'll keep looking into the problem myself, but I would be very grateful if someone in this forum could shed some light on this issue.
Hi! If you make an view you can script it in SSMS and get the DDL. How to do something similar with mining models/structures getting DMX? I see you easy can get XMLA, but I would prefer the "dmx-ddl"...
Still new to DM and SSIS...anyand all help is greatly appreciated!
In SSIS they say that you can use the Analysis Services Processing Task to process a mining model/mining structure, however, I do not see where you can give it a relational table to work off of. I know that I can use a data flow to do this but I wanted to go a different route if I could to process my models as I don't really necessarily need the data flow as what I am tring to do is pretty simple.
That brings me to a more general question, what is the best method for training your models using SSIS? I am building a new model everytime the package runs using some variables and the DDL task, running a query on it, and destroying it at the end of the package but I am having logistical problems training it outside of the data flow. I tried using the DM Query task but it requires that you output a result set and I am not sure if I can use it to create and process models.
I would think that they would just give you a DMX task similar to the SQL task but that does not seem to be the case. Also, when I browse the AS objects via the processing task I can only see the mining structures and not the mining models.
I'll try to explain what I'd like to do. On my SQL Server 2005, SSAS contains a mining model (In fact a cluster one). I'd like to show a detailed diagram build from this model on a web site. This diagram (and this is why I need automation) would depend on the user who's consulting it. For example, a firm A will see the number of its customers in each cluster, and this information will be different for another firm B
So, I thought about several steps to perform: 1) Feed the model with the data for each firm 2) Build a Visio diagram from the previous data using the DM addins for Visio 2007 3) Generate HTML using the Visio export wizard 4) Publish HTML And (important) this should be done automatically.
I made some experiments: Step 1 is easy to perform with SSIS Step 3 is also easy to do with the Visio SDK (using, among others, the exportAsHtml control) and Step 4 is trivial
I failed to perform stage 2, even with the SDK, since creating a diagram from a DM template requires the user to fill a wizard. By code, I'm able to create a document from a DM template, Drop a DM Stencil but when the wizard appears I'm unable to get a handle on it. And even if I was, I dunno if it would be a "clean" way of doing.
So my question is: first of all: isn't there an easier way to generate HTML from Mining Models automatically? And if my approach to this problem is the best (or the "least bad"), how to generate datamining Visio diagrams from code automatically?
Another tricky confusion to me is that: many algorithms settings for the native algorithms in SQL Server 2005 Data Mining do not really significantly improve the results of those mining models with settings changes? (Apart from clustering algorithm setting of cluster number, by setting 0 as the number of clusters, the system will automatically cluster the data into clusters which I assume is the best way of mining the model with this method).
Any good advcies on this will be a lot appreciated.
I am looking forward to hearing from you shortly for this confusion and thanks a lot in advance.
I am using BI Dev Studio for SS2005 in a research (as opposed to a production) environment. Often I want to compare the results of multiple models using the same attributes. If I switch to a different model, the Design view completely resets. Is there any way to retain the same field names with different models in the Design view?
My current workaround is to give my models similar names with AR, DT, CL, LOG, NN suffixes and make global changes in the DMX.
I have consulted the following without finding an answer: http://msdn2.microsoft.com/en-us/library/ms178445.aspx http://msdn2.microsoft.com/en-us/library/ms175642.aspx http://msdn2.microsoft.com/en-us/library/ms175678.aspx http://msdn2.microsoft.com/en-us/library/ms175637.aspx
Actor train nested table: ID MovieID Gender 1 1 F 2 1 M 3 1 F 4 1 F 5 2 M 6 2 M 7 2 F 8 3 F 9 3 F 10 4 M 11 4 M 12 4 F 13 4 F 14 5 F 15 5 M
We want to build a classifier model in order to predict the Class of a Movie based on the Gender of movie's actors. To deal with the nested table Analysis Services maps each record of the nested table to an attribute of the case table. These attributes are named Actor(n).Gender with n = 1..15, and so they are dependent on the nested table record numbers. Both Microsoft Decision Trees and Microsoft Naive Bayes algorihms use these attributes without any modification.
We are implementing a Relational Naive Bayes algorithm and we are planning to aggregate such attributes in order to make them independent of the nested table record numbers.
Next step we tried to predict some unseen cases and here we face with a very huge problem.
Lets take more two tables of unseen cases:
Movie test table: ID Class 6 + 7 NULL 8 NULL
Actor test nested table: ID MovieID Gender 1 6 F 2 6 M 3 6 F 4 6 F 16 7 F 17 7 M 18 7 F 19 7 F 20 7 F 21 8 M 22 8 M 23 8 F
Predicting the movie 6 Class is not a problem since the movie actors were included in the training dataset and when the records are mapped to attributes because they already exist in the model. But when you try to predict movies (7 an 8) with unseen actors all new attributes are simply ignored in the ALGORITHM:redict call (in_ulCaseValues is zero!) because they do not exist in the model!
I have a very simple time series model which processing works fine without any problem. However when I run the following query
SELECT
[TimeSeries].[PriceChange],
[TimeSeries].[Symbol],
PredictTimeSeries(PriceChange, -3, 2)
From
[TimeSeries]
WHERE
[TimeSeries].[Symbol] = 'x'
I get the following error:
TITLE: Microsoft SQL Server 2005 Analysis Services ------------------------------ Error (Data mining): A time series prediction was requested with a start time further in the past than the internal models of the mining model, TimeSeries, specified in the HISTORIC_MODEL_GAP and HISTORIC_MODEL_COUNT parameters can process.
The following is the excerpt of the minding model script related to the two parameters:
<AlgorithmParameters>
<AlgorithmParameter>
<Name>MISSING_VALUE_SUBSTITUTION</Name>
<Value xsi:type="xsdtring">Previous</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_GAP</Name>
<Value xsi:type="xsd:int">1</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_COUNT</Name>
<Value xsi:type="xsd:int">10</Value>
</AlgorithmParameter>
</AlgorithmParameters>
These HISTORIC_MODEL_GAP (1) and HISTORIC_MODEL_COUNT (10) should accommodate PredictTimeSeries(PriceChange, -3, 2). Could anyone shed some light on this?
- a data mining structure with about 80 columns. - a data mining model using Microsoft_Decision_Trees with 2 prediction columns.Â
I thought I would then explore the possibility of have more than 2 prediction columns, in this case 20.
I get an error message and I can't work out : a) if this is because there's a limit to the maximum number of prediction columns and where that maximum is stated. b) if something else has become corrupted c) there's a know bug and if the error message is either meaningful or not.
Either way, I'm unable to complete the data mining wizardÂ
The error message is :Errors in the metadata manager. Either the mining structure with the ID of '[my model Structure]' does not exist in the database with the ID of 'DMAddinsDB', or the user does not have permissions to access the object.
I am wondering if it is possible to use SSIS to sample data set to training set and test set directly to my data mining models without saving them somewhere as occupying too much space? Really need guidance for that.
Can someone please assist? I have no problem using the provided Algorithms (NaiveBayes, Decision Tree, etc) from SQL Server 2005 Data Mining. For example: If I want to predict whether the customers want to buy bike from the following data, then I use Age, Salary, Gender as input/attribute/feature selection and BuyBike column as "Predict" column.
Table Age Salary Gender BuyBike ------------------------------------
However, say that I have 10,000 types of bikes to predict. How to do that? Age Salary Gender BuyBike1 BuyBike2 BuyBike3 ...... BuyBike10000 ------------------------------------------------------------------------------------
Are there any online resources discussing this issue? I am desperately try to solve this problem. Please assist!
Just found that we are not able to define multiple key columns for a mining structure in SQL Server 2005 Data Mining engine, just wondering is there other way to define multiple key columns for a mining structure there? As in many cases, the table we are mining are with composite key consisting of different foriengn keys, e.g. A fact table are with transaction information and other foreign keys. If I am not able to define these composite key here for this fact table, I will have to have a named calculation in data source view to have a key column which is based on these original composite keys? Is this a better way to solve this problem or there is any other alternatives to figure it out?
Hope my question is clear for your help and I am looking forward to heaing from you shortly for your kind advices and help and thanks a lot in advance.
We are trying to use the Import/export wizard to load a text file to a SQL Server 2005 database. The input file has a variable number of columns per row. For example, the first row has 3 columns, the second has 7, the third has 3, etc. The number of columns varies from 2 to 9 in the input file. The columns are separated by an uptick (`) and the rows are terminated by {CR}{LF}. We are using code page 1252. On processing, the wizard reads the first row (with 3 columns) ok, but then assumes all the other rows have 3 columns and parses the rows accordingly, ignoring the field and row terminators.
The process worked fine with SQL Server 2000. Is there some setting that we are missing, or some configuration on the database that we should be checking?
I'm using a Row Count Task to count the number of records passing thru a particular path in my data flow. I created a package variable and referenced this variable in the Components Properties tab in the Row Count Task. I believe this is the minimum I need to do to get the row count.
However, as I explore the other tabs in the editor, I see there is something called Input Columns tab. What is this for? I didn't select anything in there and things are working fine. At first I thought that I had to choose the columns that I want to have available to me for further processing after the Row Count Task, but this isn't the case. I am able to see all my columns coming out of the Row Count Task even tho I didn't do anything in the Input Columns tab.
I am trying to get a table from production to development created and populated with data in Prod. When I create a package to set the data flow on the available input columns in mappings I do not see any columns there. The source has been defined as the production table.
Request is to merge or join or case stmt or union or... from up to four unique columns all in separate tables to new combined table (matrix) of results from said.
I have an SSIS Package. I am using script component to loop through input columns and their values. I am not able to do Null checking. The code is as below. In place of dashes , I want to do null checking but am not able to do. I tried vbNull, IsNull, TypeOf, System.dbNull but nothing is working. I guess am missing something here. Can anyone help me with this.
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
Is there by chance a cunning way to make the input columns automatically populate the output of an asynchronous script transformation?
My transformation writes several rows for each input row read. I'm creating some new columns along the way but I'd like all of the input columns to get output each time also. However I can't see any obvious way to achieve this, short of manually defining each column to the output and populating it in the script.
I have set up a script task in one of my packages that I have set up to modify another package right before running it. This package is nothing more than a data flow task that transfers rows via an sql command from one table into another. The strange thing is I have gotten it to work with some tables but not with others. T
he script bombs out in the loop where i map all of the columns found below, where i use MapInputColumn with the error HRESULT: 0xC0010009 On Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSExternalMetadataColumnCollection90.get_Item(Object Index)
The thing is this happens after looping roughly 55 times but there are still about 100 columns that it needs to loop through still.
Code Block
Dim input As IDTSInput90 = data_destination.InputCollection(0) Dim virtual_input As IDTSVirtualInput90 = input.GetVirtualInput Dim input_column As IDTSInputColumn90 Dim virtual_column As IDTSVirtualInputColumn90
' Iterate through the virtual input column collection and map field names For Each virtual_column In virtual_input.VirtualInputColumnCollection input_column = inst_data_destination.SetUsageType(input.ID, virtual_input, virtual_column.LineageID, DTSUsageType.UT_READONLY)
inst_data_destination.MapInputColumn(input.ID, input_column.ID, input.ExternalMetadataColumnCollection.FindObjectByID(virtual_column.Name).ID) Next
Just for kicks i removed the mapping portion of the code and left in the SetUsageType to see if it would update the available input columns in the destination. The script will then finish successfully but still only the 55 or so fields out of 155 are available in the input. So i then stepped through the script with the mapping portion still disabled and after it loops successfully, i call reinitialize meta data and it produces an error in the input_column variable: HRESULT: 0xC0047041.
I find it odd that this still reports to me that the script finished successfully and I also find it odd that this works fine on two other tables I've tested but not this one. Any insight would be greatly appreciated.