Predict Probability In Decision Trees

Dec 13, 2006

Hello,

I installed the bike buyer example and i am learning the DMX language. Now i wrote the following query (using MS decision trees):

SELECT
T.[Last Name],
[Bike Buyer],
PredictProbability(Predict([Bike Buyer])) AS [Probability]
From
[v Target Mail]
PREDICTION JOIN
OPENQUERY
(....... And so on..)

Now the result is surprising to me. In the resulttabel all the probabilities are equal.

Bike Buyer Probability
1 0.99994590500919611
0 0.99994590500919611
0 0.99994590500919611
0 0.99994590500919611
0 0.99994590500919611
1 0.99994590500919611

and so on.

Now i am wondering what predictProbability means. I thought that PredictProbability meant the probability that the prediction is correct. Now all the probabilities are the same and the input is different. Can somebody tell me what PredictProbability means or am I using it wrong?

Thanx in advance,

Joris Valkonet

View 6 Replies

Decision Trees, How Is Prediction Probability Calculated?

Jul 26, 2007

How is the value of Prediction Probability calculated in the context of decision trees?

View 7 Replies View Related

Decision Trees

Nov 6, 2006

I am studying the behavior of 200.000 clients. With the use of decision trees I would like to know if my clients will abandon our service or not. I use a training set of 21.822 clients and I use a predict variable "aband" wich is a discrete variable and it can be 0 or 1. In my training set i have 21.597 cases in which aband is 0 and 255 cases in which aband is 1. Looking at the classification matrix obtained using as input table a testing set (unselected data) I can see that my decision tree doesn't recognize the cases in which aband is 1. Here is the Classification Matrix:
Counts for Dati Training on [Aband]
Predicted 0 (Actual) 1 (Actual)
0 21597 225
1 0 0

What should I do?

Chiara

View 3 Replies View Related

Decision Trees, DMX And CONTAINS (T-SQL)

May 18, 2006

I would appreciate answers to the following doubts I have regarding Decision trees, CONTAINS and using CONTAINS in a DMX query:

1. Does MS decision tree work only off equality/inequality conditions for the nodes? Is it possible to use a predicate as the branch criteria for a node?

2. Can the T-SQL predicate CONTAINS(...) be used in a DMX query? I need to check if a column-value is a substring of another column and create an intermediate column that will enable me to construct a decision tree with the phrase-present/absent branch.

3. Can CONTAINS(...) be used in a select clause? Like -

SELECT CONTAINS(JAT.column1, '"Good day"')

FROM JustAnotherTable;

4. Does CONTAINS(...) support both arguments to be column references? Or, is it mandatory that the pattern (argument #2) has to be a literal string or a variable? E.g.: I need to know the validity of the following expression -

SELECT * FROM JustAnotherTable JAT

WHERE CONTAINS(JAT.column1, JAT.column3);

View 1 Replies View Related

MS Decision Trees Question

Aug 3, 2007

Hi,

I'm new to data mining, and have created an MS decision trees model. The model has the columns age, call outcome, call reason, country name, employee name and gender - all as inputs.

In the mining model viewer, I only get nodes for the age, despite having data for all the other columns.

Can anyone help?

Thanks
Jeremy

View 12 Replies View Related

Decision Trees Parameters

Dec 8, 2007

Hi,

I'm interested in understanding how the parametes work in the MS Decision Trees algorithm.

As far as I can tell, the MINIMUM_SUPPORT and COMPLEXITY_PENALTY parameters both control the number of splits and hence the depth of the tree.

Unfortunately the BOL descriptions are very brief - so can anyone tell me the difference between these 2 parameters?

Thanks
Jeremy

View 1 Replies View Related

Calculating R-Squared From Decision Trees

Mar 19, 2007

Hello.

I am trying to build a decision tree to predict prices. I have created the tree and looked at the lift charts, but I have not seen any of the traditional statistics I am used to from other programs (R-Squared, F statistics, etc.).

Does anyone have an example of how they calculated R-Squared for a decision tree on a continuous variable?

Thanks,
Brian

View 9 Replies View Related

Forcing A Branch With Decision Trees

Sep 12, 2007

In a decision tree algorithm, is there a known way to force a branch at a top level? For exmaple, I have 30 known decision patterns that are going to be completely different and I don't want them to intermingle. I wanted to force a branch at the top node on one of the 30 patterns so I wouldn't have to create 30 mining models per client.

Brian

View 4 Replies View Related

Question About Decision Trees And Neural Networks

May 9, 2006

I have some accounting data, with some transaction attributes and amounts.
I'm using Decision Trees to try and predict the next month's amount for certain combinations of attributes.

I've tried two different structures for the model:

A: one with 9 discrete text input attributes.
B: And another with the same 9 attributes + a avarage Amount for all combinations of the nine attribute for every transaction.

When i've processed them and look in the dependency network, it says that the strongest link for the structure A is attribute "1".
And for the second its the avarage-Amount attribute.
Okey, that seems fine, but the second strongest link in structure B is attribute "2".

Shouldn't it be attribute 1 like in structure A?

Second question, if I run the same data in a Neural Network model, the prediction becomes much worst then the decision tree.
I get many predictions that are negative values even though all training data contains positiv values.
The StDev becomes the same for every row also..
What am I doing wrong with that one. I have alot of transactions and a read somewhere that a Neural Network should work better than a decision tree in a case similar to mine.
The score in the "Lift chart" for the Neural Network model becomes 0,00 and for Decision Trees with the same data I get around 110.

View 1 Replies View Related

'Decision Trees Found No Appropriate Regressors For Model' Question

Dec 7, 2006

Hi,

I am using MS Decision Trees algorithm and for a specific model i get the above warning.As a result of that i dont get any splits in my tree. Is there anything i can do to avoid this?

Thank you for reading

View 1 Replies View Related

Error: Decision Trees Found No Splits For Model

Jan 5, 2007

Hi,

I am trying to run one of the mining models from the book "Delivering BI using SQl Server 2005" but I am running into "Decision Trees found no splits for model". The mining structure has 4 columns, the fourth one being marked as "Predict Only". My Cube slice for the model has sufficient data in the cube. I am lost.. Help!!

Regards

View 4 Replies View Related

Data Mining In 2005 - Same Input Columns Resulting In Different Decision Trees

Dec 12, 2007

While recently working with several mining models, I came across something that struck me as pretty odd - and I'm hoping to find an explanation for the behavior.

Consider the following setup:

A single table in the relational database represents the only case table
A single, continuous column is the predictable
A mining structure has been created

The mining structure contains a single model, based on the MS Decision Trees algorithm
Input columns were selected for the model via the BI Studio wizard (i.e., those provided via the "Suggest" button)
The structure has been fully processed
Now, the interesting parts:

I view the scatterplot for the mining model, under the Mining Accuracy Chart tab
Back on the Mining Structure tab, I delete one of the input columns
I add the same column back into the structure
The structure is fully processed again
When I view the scatterplot for the mining model, under the Mining Accuracy Chart tab, a different set of data points are presented for the model predictions
A different set of decision trees under the Mining Model Viewer tab confirms thisHow could different patterns have been found this second time around, even though all of the input columns were the same (as well as the training cases)?

(Note: I encountered this situation while creating a new mining model that was identical to an existing one. Even though the models received the exact same inputs and training cases, they yielded different results. I was able to reproduce the behavior by using steps 1-6 above, though.)

Can someone provide some insight on this behavior, or some kind of explanation of what may be happening?

Thanks,
Joe Miller

View 3 Replies View Related

Data Mining :: Informational (Data Mining) - Decision Trees Found No Splits For Model

Sep 29, 2015

I followed the tutorial posted at [URL] ...

Everything was ok until the last step where I had to process the mining structure which resulted in a warning

"Informational (Data mining): Decision Trees found no splits for model, Tbl Decision Tree Example."

What does this error mean? How do I resolve it? Also, I only see the first level in the Mining Model Viewer, I don't see the levels 2 and 3.

View 2 Replies View Related

How To Predict?

Mar 30, 2007

I have a question.The question is to predict what to do next.

I have a table with three columns.

CustomerID ServiceContent ServicesID

140105100001 Service1 1

140105100001 Service2 1

140105100001 Service3 1

140105100001 Service1 2

140105100001 Service5 2

... ... ...

I want to use SQL Server 2005 Data Mining to predict what services that one customer will accept next.

Thanks

View 4 Replies View Related

Predict Product With Sex And Age Of Customer.

Sep 17, 2006

i have table:

customer(customerid, age,sex....)

orderdata(orderid, customerid,day)

orderdetails(orderid, productid, quantity)

products(productid, productname,...)

now, i want to show some product for customer when i now him age and sex.

e.ct: if he is a man and age =20 i show product : ball, pull, sport close....... if man is a woman , i show lips, babara, t_shirt, skirt....

if man is a chirdren, i will show joy, story for chidren....

how i create my mining model. and how i query for result in DTS

View 1 Replies View Related

The Mean Of Using Association With Importance And Probability

Apr 12, 2007

hi,
i have a exercise using association datamining
my database have 350 records,
i use 90 records for datamining and it release some rules which i choose on top of mSOLAP_NODE_SCORE,
but when i use select statement to check my result i have 1 records, the same as my result, and 5 records not true;
for example:
rules A=a,B=b-> C=c
select * from <my_table> where A='a' and B='b' and C='c'; ==>1 record return
select * from <my_table> where A='a' and B='b' and C<>'c'; ==>5 records return
C with 3 values c1,c2,c
with the second statement C includes 2 c1 and 3 c2

i don't understand how they work.
i want to choose some best rules can present my database.
how can i choose importance and probability to get best rules.
with database have 90 records and a database have 350 records which values i should use for minimum_probability, Minimum_Support, Minimum_importance...
when i choose rules i should choose on importance or probability.

thanks for your help

View 4 Replies View Related

DMX Where Clause: Filtering According To The Adjusted Probability

Mar 14, 2008

I have a DMX query like this:

Code Snippet

select * from (
select flattened(*) from (
select att1, topcount(predict([Trans Predictor Unified], INCLUDE_STATISTICS), $Adjustedprobability, 7) as predictedstuff
from [Trans Predictor Model]
prediction join
SHAPE {openquery(DMSCS, 'select distinct CAST(att2 as nvarchar(100)) att1 from DMSCS.dbo.CartProducts order by att1 ')}
append
({openquery(DMSCS, 'select CAST(att2 as nvarchar(100)) att1 , att4, att5 as att3
from DMSCS.dbo.CartProducts order by att1 ')
}
relate [att1] to [att1]) as [Trans Predictor Unified]
as SHAPEQ
on [Trans Predictor Model].[Trans Predictor Unified].att3 = SHAPEQ.[Trans Predictor Unified].att3
) as s
) as t where [predictedstuff.$AdjustedProbability] > 0.5

It's working well. I would like to modify one thing. I would like to chang ethe constant in the where condition, so that it is configurable. That is, I would like to store the constant somewhere (SSAS or relational SQL). I was reading the DMX reference, but it doesn't provide much details about the where's "condition expression". And I looked at a document called "OLE DB for Data Mining Specification version 1.0" of July 2000, which does have in Appendix B the SELECT grammar. There it has

<expression> -> <value>
[...]
| ( SELECT <expression_list> FROM <expression> <where_clause>
[...]

<where_clause> -> WHERE <expression>

If I change the end to

where [predictedstuff.$AdjustedProbability] > (select 0.5 from [Trans Predictor Model] )

, however, just to force some form of query there I get a message saying "The specified column was not found in the context".

I'm running SQL Server 2005.

thanks,
Gustavo

View 1 Replies View Related

Data Mining Problem: Is That Possible To Predict Many Many Columns?

Mar 22, 2007

Hello,

Can someone please assist?
I have no problem using the provided Algorithms (NaiveBayes, Decision Tree, etc) from SQL Server 2005 Data Mining. For example: If I want to predict whether the customers want to buy bike from the following data, then I use Age, Salary, Gender as input/attribute/feature selection and BuyBike column as "Predict" column.

Table
Age Salary Gender BuyBike
------------------------------------

However, say that I have 10,000 types of bikes to predict. How to do that?
Age Salary Gender BuyBike1 BuyBike2 BuyBike3 ...... BuyBike10000
------------------------------------------------------------------------------------

Are there any online resources discussing this issue? I am desperately try to solve this problem. Please assist!

Mary

View 5 Replies View Related

PredictCaseLikelihood Returns Low Probability For Sequences That Are Very Frequent

Jun 28, 2007

I am working on a text mining application wherein I need to detect unusual/anomalous sentences in text. Certain sentences, that I know occur very frequently, are given a likelihood of 0.2 by PredictCaseLikelihood. Other sentences that are just as frequent get a much higher likelihood (>0.9). I am using the NORMALIZED option. The only significant difference between these sentences is their length. The one with the lower likelihood has only 2 words in it, whereas the one with the higher likelihood has more than 10 words. The problem is that the shorter sentences end up being interpreted as anomalous, when in fact they are'nt. Any suggestions?

View 2 Replies View Related

Predict Products ( Data Mining 2000)

Oct 6, 2006

i want to make a web page and when somebody come in. i want show for him which products that everyone often buy at that time ( month or summer ).

how i do in data mining to predict that products ?

more: i want know how much percent of product is like by buyer

or i want show products with desc % of the like of people

View 4 Replies View Related

Question On Value Of Probability Of Value 1 Or 2 In Neural Network Viewer

Jul 16, 2007

Hi, all,

I am confused about the value of Probability of Value 1 or 2 (on a particular attribute value) in Neural Network viewer. E.g. the value of Probability of value 1 is actually very low (the same to the value of Probability of value 2), but why the bar which shows the strength of the probability of these two values are still so strong even stronger than other values of probability of value 1 or 2 based on other attribute values which have a much higher probability of value 1 or 2?

And how does the algorithm calculate the Probability of attribute value in nerual network by the way?

Hope my question is clear.

I am looking forward to hearing from you shortly and thanks a lot in advance.

With best regards,

Yours sincerely,

View 3 Replies View Related

Data Mining In DMX: How To Get All NodeID With Probability +ve Result &&> X

Aug 19, 2007

Dear All,

In a data mining model with decision tree algorithm. For example I have the following train case table:

StudentID, IQ,EQ, IsPass.

I put all data in the table into the microsoft decision tree datamining model
StudentID is the key for datamining model
IsPass is prediction only data
IQ, EQ is the input.

1. How can I make a DMX selection to find out all NODE_UNIQUE_NAME with probability of IsPass >0.7.
2. How can I make a DMX selection to find out all the StudentID which belongs to the criteria defined by the Node?

Thanks and regards

Tony Chun Tung Siu

View 1 Replies View Related

Why The Node_distribution.PROBABILITY Greater Than 1 In Clustering Algorithm?

Nov 24, 2006

Hi, all experts here,

Thank you very much for your kind attention.

I am having a question about the node_distribution.PRABABILITY. Some of the attribute values though have a small number of support for the specific node, but why it has a big node_distribution.probability even greater than 1? How can the node_distribution.PROBABILITY be greater than 1? How dose SQL Server 2005 data mining engine calculate the node_distribution.PRPBABILITY for its Clustering algorithm? Really confused and need guidance for that.

Thank you very much for your help.

With best regards,

Yours sincerely,

View 7 Replies View Related

How To: Set Up && Interpret A Model To Predict Even Dollar Transactions

Feb 12, 2007

Generally, what would be a good start to model and make predictions based on the following.

Even Dollar Transactions �� Including transactions that end in $0 or $50 with a Total Transaction Amount between $400 and $5000.

I have a table called Transaction with a BillingAmount column.

I've gone through the SQL Server 2005 Data Mining Tutorial (which uses a discrete prediction BikeBuyer yes or no) and read a lot of Data Mining with SQL Server 2005. Neither of these give good, complete, methodical examples of selecting inputs and targets and making predictions; especially for continuous columns.

I think I'm generally struggling with the concepts of selecting (continuous?) predictable attributes and calculating predictions based on the results. Are there any examples of a scenario similar to the above that I can reference? or just some advise.

View 6 Replies View Related

T-SQL (SS2K8) :: Query To Predict Future Job Schedules From MSDB Tables

May 8, 2014

I want to query my msdb job and jobschedule related tables to generate a list of runtimes for each of these jobs for the next day or any future date. This query should output JobID, Run_Date(YYYYMMDD), and Run_Time(HHMMSS).

If I have 3 jobs with...

Job#1 scheduled to run once every 4 hours between 6 AM and 10 PM
Job# 2 scheduled to run every 15 minutes between 11 AM and 1 PM
Job# 3 scheduled to run every minute between 4 PM and 4:15 PM

my output should look as below ....

View 1 Replies View Related

Strange Problem With Mining Model Node_distribution.Attribute_value And Its Probability

Nov 24, 2006

Hi, all here,

Thank you very much for your kind attention.

I dont understand another problem within my mining model. When I query the mining model content ,finding that the same attribute_value have different support and probability for the same node within my clustering model. Why is that? Really confused. And really need help for that.

Thank you very much in advance for your help.

With best regards,

Yours sincerely,

View 9 Replies View Related

CTE For Trees

May 28, 2008

Hi,

My database looks like:

CategoryID ParentID Title Sort

1 -1 Cars 1
2 1 Honda 1
3 -1 Bikes 2
4 1 Ford 2
5 1 Toyota 3
6 3 Kawasaki 1

How can I retrieve the values in the following order:
1, 2, 4, 5, 3, 6

I have:

WITH MYCTE(categoryID, parentID, Title, Sort)
(
SELECT TOP 1 categoryID, parentID, Title, Sort
FROM Categories
WHERE parentID = -1
ORDER BY Sort ASC

UNION ALL

SELECT c.categoryID, c.parentID, c.title, c.sort
FROM Categories c
INNER JOIN MYCTE cte ON (cte.categoryID = c.parentID)

)

SELECT *
FROM MYCTE

It doesn't seem to work though? Help! hehe

View 4 Replies View Related

Working With Trees And XML...

Oct 26, 2005

Hi, for a new project i'm trying to build a tree structure in SQL using one table with 'Node' & 'ParentNode' fields along with 'title', etc.

Table = Tree
Node : ParentNode : Title : Show_Record
1 0 Root 1
2 1 Child 1

Then i'm trying to get SQL to return that in XML to my Tree Control 'oBout ASP TreeView'.

Now the tree control can accept XML fine as long as it's in a set format, which shouldn't be difficult and should cut my code from 200 lines to one.

However getting SQL to return the table records in XML is proving to be a total nightmare.

I've hunted the web but not getting very far, I've even got a couple of O'Reilly guides but still no luck, so any help would be excellent with this.

I wrote a sql query (basic 'select * from tree for xml raw') which returns the results in RAW XML, but when I run this in Query Analyser it returns the results as one long string broken up with '<' & '>' but gets to the third record and cuts off halfway.

<row node="1" parentnode="0" title="Root" type_image="book.gif" type_expanded="True"/><row node="2" parentnode="1" title="Service Delivery" type_image="page.gif" type_expanded="False"/><row node="3" parentnode="1" title="Business Support" type_image="page.

Anyone know why Query Analyser does that?

Any help in this much appreciated, as you can imagine i'm at my wits end.

:eek:

View 4 Replies View Related

Predicting In Trees

Jun 30, 2006

Hi! I have created a DMM using Trees. But when I go to the Mining Model Predition tab and select a Predict function, I get this in the criteria column: <Scalar column reference>[, EXCLUDE_NULL|INCLUDE_NULL][, INCLUDE_NODE_ID]. When select Result, I get this error: "An incorrect number of arguments are used in the function at line 3, column 3." I'm predicting a continuous variable.

But when I delete everything except <Scalar column reference> I get this error: "Parser: The syntax for '<' is incorrect."

When I delete everything in the criteria column, I get this: "Query execution failed."

If I change the criteria to "<Scalar column reference>,INCLUDE_NULL, INCLUDE_NODE_ID" I get the error again that the query execution failed.

I'm working from a data set I created. I had no problems with predictions using clustering, but can't seem to get Trees to work.

View 3 Replies View Related

Autoregression Trees

Mar 6, 2007

hi,

I am using Time series alogorrithm.I just wants to know about the autoregression tree.I am having data like

Studid Date Perf

001 01/01/2007 90

001 02/01/2007 95

001 03/01/2007 89

002 01/01/2007 79

002 02/01/2007 90

002 03/01/2007 95

Like that. when I use my Model Viewer --> Descision Tree --> It shows like

Perf = 90.0084 + 1.02 * Perf(-2) + 0.25 * Perf(-2).

What is this value and how its getting calculated?

View 1 Replies View Related

Predict Query Gives 'DMPluginWrapper; Object Reference Not Set To An Instance Of An Object' Error

Mar 17, 2008

Hi,

I am trying to develop a custom algorithm. I have implemented and tested training methods, however I fail at prediction phase. When I try to run a prediction query against a model created with my algorithm I get:

Executing the query ...
Obtained object of type: Microsoft.AnalysisServices.AdomdClient.AdomdDataReader
COM error: COM error: DMPluginWrapper; Object reference not set to an instance of an object..
Execution complete

I know this is not very descriptive, but I have seen that algorith doesn't even executes my Predict(..) function (I can test this by logging to a text file)
So the problem is this, when I run prediction query DMPluginWrapper gives exception -I think- even before calling my custom method.

As I said it is not a very descriptive message but I hope I have hit a general issue.

Thanks...

View 3 Replies View Related

Design Decision

Dec 10, 2007

Hello everyone

I'm designing a new database for a project. In this database I have a calendar table with with the following columns:

id, dateValue, year, quarter, week, month, englishMonthName, day, englishDayName, dayOfTheYear, isWeekendDay

This database also has a session monitor that logs every access to the database, with a relation to the calendar row id. This way, I can make database access reports, without replicating the date value.

My question is: for a membership table should I follow the same principle and relate member row to the session monitor, which in turn, relates to the calendar or should I put the date just there?

Some of the tables of this database will have to handle some heavy load, both for updating and selecting. This said, my question is should I make a link or put the date just there to extinguish the need to make 2 joins just to know when something was registered / updated? If I only place the relation to know the date I'll have to do something like:

SELECT
DATEADD(ss, x.timeOffsetInSeconds, c.dateValue) AS date
FROM
<somewhere> x
JOIN sessionMonitor sm ON sm.id = x.sessionMonitorId
JOIN calendar c ON c.id = sm.calendarId

Instead of just doing a select x.lastUpdateDate

How would you gurus usually deal with these situations?

Best regards

View 4 Replies View Related

Index Decision

Jan 17, 2008

I've decided to put the clustered index on the edit date column on an audit table. As the edit date for a new record is always going to be higher (more recent) than the previous record, the value would go onto the end of the index. So is there still a value in (1) providing a fill specification of less than 100% and (2) padding the index?

View 13 Replies View Related