I have some accounting data, with some transaction attributes and amounts.
I'm using Decision Trees to try and predict the next month's amount for certain combinations of attributes.
I've tried two different structures for the model:
A: one with 9 discrete text input attributes.
B: And another with the same 9 attributes + a avarage Amount for all combinations of the nine attribute for every transaction.
When i've processed them and look in the dependency network, it says that the strongest link for the structure A is attribute "1".
And for the second its the avarage-Amount attribute.
Okey, that seems fine, but the second strongest link in structure B is attribute "2".
Shouldn't it be attribute 1 like in structure A?
Second question, if I run the same data in a Neural Network model, the prediction becomes much worst then the decision tree.
I get many predictions that are negative values even though all training data contains positiv values.
The StDev becomes the same for every row also..
What am I doing wrong with that one. I have alot of transactions and a read somewhere that a Neural Network should work better than a decision tree in a case similar to mine.
The score in the "Lift chart" for the Neural Network model becomes 0,00 and for Decision Trees with the same data I get around 110.
I created a test table (name - "Nset") with the columns: id (int), n1 (float), n2 (float), n3 (float) and c1 (varchar). Then filled a table the followings information: id n1 n2 n3 c1
1 0,1 0,1 0,6 one
2 0,2 0,1 0,5 one
3 0,7 0,5 0,1 two
4 0,4 0,9 0,3 two
5 0,5 0,1 0,5 three
And created a neural network with tuning by default. "id"-field is the key. n1, n2 and n3 are inputs. c1 - predict.
Then i tryed predict query, like:
SELECT
PREDICT([Nset].[c1])
FROM
[Nset]
NATURAL PREDICTION JOIN
(SELECT 0,5 AS [n1], 0,1 AS [n2], 0,5 AS [n3]) AS t
The result is "three". This is correct. And some other tests appeared correct.
But, when I filled the column c1 with numerical values (one = 1, two=2, three=3) and changed type to int, a predict query left off to work correctly.
Previous query return 4.
And other tests showed that a value returned large on unit.
Is the standarization of the inputs done automatically when running the Microsoft Neural Network algorithm or I should be transforming the variables before running the algorithm?
2) Predicted Probabilities
How do I create a table with the actual predicted probabilities of the model for each observation? In the Mining Model Prediction tab the output would be either 0 or 1, my question is how can I obtain the actual value of the estimated probability?
I am getting negative predictions (continuous) from a neural network model that has been trained on data that only contains positive values or zeros (no nulls).
Is there a setting that can limit the lower end of the output range to zero?
I am trying to get familiar with Microsoft neural networks to predict property prices. The results are better but I wanted to amend the default parameters passed to the neural network.
So on MINING MODEL TAB when I right click and go into SET ALGORITHM PARAMETERS, I can't see any parameters there, if I try to enter a parameter for example MAXIMUM_STATES and process the model I get the following error message
"The 'maximum_states' data mining parameter is not valid for the 'My Model' model"
I also added a decision tree model to the same structure and when go into SET ALGORITHM PARAMETERS pop menu it comes with many pre populated parameters with default values.
My question is that why I am unable to add parameters to the NEURAL NETWORK and why it does not come with pre populated parameters like DECISION TREES.
I am studying the behavior of 200.000 clients. With the use of decision trees I would like to know if my clients will abandon our service or not. I use a training set of 21.822 clients and I use a predict variable "aband" wich is a discrete variable and it can be 0 or 1. In my training set i have 21.597 cases in which aband is 0 and 255 cases in which aband is 1. Looking at the classification matrix obtained using as input table a testing set (unselected data) I can see that my decision tree doesn't recognize the cases in which aband is 1. Here is the Classification Matrix: Counts for Dati Training on [Aband] Predicted 0 (Actual) 1 (Actual) 0 21597 225 1 0 0
I would appreciate answers to the following doubts I have regarding Decision trees, CONTAINS and using CONTAINS in a DMX query:
1. Does MS decision tree work only off equality/inequality conditions for the nodes? Is it possible to use a predicate as the branch criteria for a node?
2. Can the T-SQL predicate CONTAINS(...) be used in a DMX query? I need to check if a column-value is a substring of another column and create an intermediate column that will enable me to construct a decision tree with the phrase-present/absent branch.
3. Can CONTAINS(...) be used in a select clause? Like -
SELECT CONTAINS(JAT.column1, '"Good day"')
FROM JustAnotherTable;
4. Does CONTAINS(...) support both arguments to be column references? Or, is it mandatory that the pattern (argument #2) has to be a literal string or a variable? E.g.: I need to know the validity of the following expression -
I'm new to data mining, and have created an MS decision trees model. The model has the columns age, call outcome, call reason, country name, employee name and gender - all as inputs.
In the mining model viewer, I only get nodes for the age, despite having data for all the other columns.
I am trying to build a decision tree to predict prices. I have created the tree and looked at the lift charts, but I have not seen any of the traditional statistics I am used to from other programs (R-Squared, F statistics, etc.).
Does anyone have an example of how they calculated R-Squared for a decision tree on a continuous variable?
I installed the bike buyer example and i am learning the DMX language. Now i wrote the following query (using MS decision trees):
SELECT T.[Last Name], [Bike Buyer], PredictProbability(Predict([Bike Buyer])) AS [Probability] From [v Target Mail] PREDICTION JOIN OPENQUERY (....... And so on..)
Now the result is surprising to me. In the resulttabel all the probabilities are equal.
Bike Buyer Probability 1 0.99994590500919611 0 0.99994590500919611 0 0.99994590500919611 0 0.99994590500919611 0 0.99994590500919611 1 0.99994590500919611
and so on.
Now i am wondering what predictProbability means. I thought that PredictProbability meant the probability that the prediction is correct. Now all the probabilities are the same and the input is different. Can somebody tell me what PredictProbability means or am I using it wrong?
In a decision tree algorithm, is there a known way to force a branch at a top level? For exmaple, I have 30 known decision patterns that are going to be completely different and I don't want them to intermingle. I wanted to force a branch at the top node on one of the 30 patterns so I wouldn't have to create 30 mining models per client.
I am using MS Decision Trees algorithm and for a specific model i get the above warning.As a result of that i dont get any splits in my tree. Is there anything i can do to avoid this?
I am trying to run one of the mining models from the book "Delivering BI using SQl Server 2005" but I am running into "Decision Trees found no splits for model". The mining structure has 4 columns, the fourth one being marked as "Predict Only". My Cube slice for the model has sufficient data in the cube. I am lost.. Help!!
While recently working with several mining models, I came across something that struck me as pretty odd - and I'm hoping to find an explanation for the behavior.
Consider the following setup:
A single table in the relational database represents the only case table A single, continuous column is the predictable A mining structure has been created
The mining structure contains a single model, based on the MS Decision Trees algorithm Input columns were selected for the model via the BI Studio wizard (i.e., those provided via the "Suggest" button) The structure has been fully processed Now, the interesting parts:
I view the scatterplot for the mining model, under the Mining Accuracy Chart tab Back on the Mining Structure tab, I delete one of the input columns I add the same column back into the structure The structure is fully processed again When I view the scatterplot for the mining model, under the Mining Accuracy Chart tab, a different set of data points are presented for the model predictions A different set of decision trees under the Mining Model Viewer tab confirms thisHow could different patterns have been found this second time around, even though all of the input columns were the same (as well as the training cases)?
(Note: I encountered this situation while creating a new mining model that was identical to an existing one. Even though the models received the exact same inputs and training cases, they yielded different results. I was able to reproduce the behavior by using steps 1-6 above, though.)
Can someone provide some insight on this behavior, or some kind of explanation of what may be happening?
I am wondering the best way to go about a task I have been assigned. We have two similar websites but each is located on a different network. One network is secure so it cannot be accessed on the normal WWW. The secure network will contain the master database. I need to write a program or do something with SQL server to retrieve all records from the WWW site and get them onto the secure database. I also in the future will need to update records from the WWW site if they have been updated. What is the easiest way to move data from one network to the other when I cannot connect to both databases simultaneously? Thanks, Matt
How can I copy a database from one server to another when they are not on the same network! I have tried to copy across the backup file from one and attempted a restore but i keep getting an error message (Abnormal execution). Is there a way to do this! HELP!
Does anyone know if SQL7 will work with Storage Area Networks(SAN's)? I've read that SQL2000 implements something called a Virtual Interface System Area Network (VI SAN) that allows communication with devices connected via a SAN.
My site is installing a SAN and I need to know if SQL7 can utilize those resources (Storage,etc) and how reliable if so. Randy
In SQL Servern Books is written: SQL Server Authentication is provided for backward compatibility only. When possible, use Windows Authentication.
How can Windows Authentication be realized in small networks from two ore more computers running Windows XP?
Having all a application written in VB Net 2005 for example, which connects to a central database on one computer. Where the cost and afford for a domain controller running Windows Server is not necessary.
If Windows Authentication can't be realized or can't be realized easy in such a scenario, and SQL Server Authentication is not supported any more, then SQL Server can't be taken as database server for this scenario, where the focus is at simplicity and low cost.
Hello there, I'm currently building up a SQL 2005 Active/Standby cluster in a DMZ. I have three NIC's in each server. Each NIC is connected to a different network: 192.168.100.1 is the public NIC 10.0.0.1 is the NIC used for communication betwen the cluster nodes (heartbeat) 192.168.200.1 is the admin NIC I have installed my cluster using the 192.168.100.0 network for public access. This means that my SQL virtual ip is 192.168.100.10 Each server can be administered over the 192.168.200.0 network (admin) and the cluster/sql sever ip is available from the 192.168.100.0 (public) network. Now for my question: How can I assign a ip address from my admin network (e.g.192.168.200.10) to the existing SQL server cluster to make it available from my admin network while keeping the public ip. Thanx in advance!
Hi, for a new project i'm trying to build a tree structure in SQL using one table with 'Node' & 'ParentNode' fields along with 'title', etc.
Table = Tree Node : ParentNode : Title : Show_Record 1 0 Root 1 2 1 Child 1
Then i'm trying to get SQL to return that in XML to my Tree Control 'oBout ASP TreeView'.
Now the tree control can accept XML fine as long as it's in a set format, which shouldn't be difficult and should cut my code from 200 lines to one.
However getting SQL to return the table records in XML is proving to be a total nightmare.
I've hunted the web but not getting very far, I've even got a couple of O'Reilly guides but still no luck, so any help would be excellent with this.
I wrote a sql query (basic 'select * from tree for xml raw') which returns the results in RAW XML, but when I run this in Query Analyser it returns the results as one long string broken up with '<' & '>' but gets to the third record and cuts off halfway.
Hi! I have created a DMM using Trees. But when I go to the Mining Model Predition tab and select a Predict function, I get this in the criteria column: <Scalar column reference>[, EXCLUDE_NULL|INCLUDE_NULL][, INCLUDE_NODE_ID]. When select Result, I get this error: "An incorrect number of arguments are used in the function at line 3, column 3." I'm predicting a continuous variable.
But when I delete everything except <Scalar column reference> I get this error: "Parser: The syntax for '<' is incorrect."
When I delete everything in the criteria column, I get this: "Query execution failed."
If I change the criteria to "<Scalar column reference>,INCLUDE_NULL, INCLUDE_NODE_ID" I get the error again that the query execution failed.
I'm working from a data set I created. I had no problems with predictions using clustering, but can't seem to get Trees to work.
I am having trouble setting up my Pull Subscription and I am new to replication.
I have several servers hosting a databased website that will be the same, except for user input and traffic. Quite simply, I need to copy most tables, SPs and data from network to network. I can't use FTP/Web synch ... as I mentioned the networks do not touch eachother or the internet.
On server Web1, it was easy to create a Publication called Pub via the wizard for my database: TheDB. Then on Web1, again, I added a Subscription to the Publication, indicating my second server, Web2, and the same database name: TheDB (I have already backed up and restored TheDB to all my servers). Here's one of the sp's I ran on Web1:
I copied the snapshot folder, ie. 20070709134423, onto CD and moved it into Web2's default replication folder, but I always receive: cannot connect to Distibutor. I've tried using an Alias, as well, but don't understand exactly how I should point that either. I checked the publication's PAL and my Web2 user has rights and is an owner of the Web2 TheDB database.
I have two problems while trying to train a neural network. My network have 10 continuous input ad 1 discrete output (3 states)
The parameters I chose are : -Hidden node ratio 10 -Holdout percentage 10
The others are default.
First,when i train it thanks to BI dev studio, the training is very fast (less than 5 seconds) and the results compared with the training set are bad (at least 30% of errors). Is there a way to improve the training (I don't care about the time required to train if it works)?
Second, I decided to train the network using SQL server management studio and I get this error which I can't understand : "Les connexions ad hoc telles que spécifiées dans des clauses OPENROWSET ne peuvent pas être utilisées sur ce serveur". Translated it may be something like "this server can't use ad hoc connections such as specified in OPENROWSET".
My query is :
INSERT INTO MINING model [Associations Learn2]([From Requete1],[From Requete2], [Keywords1],[Keywords2],[Nb Apparition1],[Nb Apparition2],[Nombre Requete Distincte],[Probabilite],[Titre1],[Titre2],[Type],[Uid]) OPENROWSET ('SQLNCLI.1','Data Source=STAG-XP-EDITION;user=sa;password=***;Initial Catalog=OpenFind_StockagePreNeurone', 'SELECT [From Requete1],[From Requete2], [Keywords1],[Keywords2],[Nb Apparition1],[Nb Apparition2],[Nombre Requete Distincte], [Probabilite],[Titre1],[Titre2],[Type],[Uid] FROM associationsLearn2' )
General data mining books talk about NN taking inputs which are between -1 and 1. Even Jamie's book says that's what it generally receives. I don't think this is a requirement for the Microsoft algorithm, but I wanted to ask if it was a best practice. If you're feeding it something like home values where 99% of homes are under $1 million you can use some normalization trick so that mansions don't skew the data. But if your data doesn't need such normalization, is there any need to normalize it to the -1 to 1 range?
Also, is the Microsoft algorithm sensitive to the relative size of different inputs? For instance, if InputA is home size (500-50,000 square feet) and InputB is months unoccupied (0-24 months), does that cause the Microsoft NN to weigh home size more heavily?
Hello , using MS Visual studio 2005 , I deployed sql table with NN algorithm , it successfuly deployed . But when I tabbed to "Mining Model Viewer" it gave me the following error :
The following system error occurred: Invalid procedure call or argument. Execution of the managed stored procedure GetAttributeScores failed with the following error: Exception has been thrown by the target of an invocation.Microsoft::AnalysisServices::AdomdServer::AdomdException.
I'm working with Analysis sevices 2005 developer edition. Looking through the documentation i becomes apperent that the NN algorithm takes 255 input attributes by default. This can be changed to any integer value, OK....
My problem is that I want to feed the network with 40000 input variables. In order to do so, I will have to do a select:
SELECT fld1, fld2, ...... fld39999, fld40000
FROM tblSometable
However, this is not possible, as the books online describes it is only possible to return 4096 columns from a select statement.
Question : How do I populate a NN in AS2005, with nmore than 4096 inputs ?!