LogRegHelper - A Scorecard For Logistic Regression Models Does Not Match Logistic Regression Favors Score

Jun 24, 2007

Hello,



This question is regarding the LogRegHelper - "A scorecard for Logistic Regression models" example in sqlserverdatamining Tips and Tricks page. I launched TestLogReg (Analysis Services Database associated with the project) and ran Logistic Regression over that. While the LogReg shows the highest score for IQ (107 - 121), a score of 558, the Logistic Regression shows that Parent Encouragement has the highest score for the case College Plans = 'Plans to Attend'. Can someone verify this and clarify?



I have a few other questions with LR



- In SQL Server 2005 LR Mining Model Viewer "favors" chart, what algorithm is used for generating Scores?



- Can I use this score as a feature selector? Higher score => stronger predictor (input)



- Is the coefficient weight algorithm used in LogReg wrong ?



Thanks



MA

View 1 Replies


ADVERTISEMENT

Retrive Score In Logistic Regression (Microsoft Neural Network Viewer - SQL Server 2005)

Feb 19, 2008

Hi!

I bought the book €œData Mining with SQL Server 2005€?, but I can€™t find the solution to a problem I have.

I want to retrieve from C# the logistic regression Attribute Value (AV) Scores for the Logistic Regression Algorithm. I can see the Scores from the Microsoft Logistic Regression Viewer (the same of Neural Network Viewer), but I cannot retrieve them via DMX, OLEDB or similar.

Otherwise, is there a formula that I can use to compute that score from the coefficient, support, or probability values of the Attribute Value pair (I can read this values from DMX)?
I can access to them via DMX:

NODE_DISTRIBUTION -> SUPPORT and PROBABILITY ATTRIBUTE_VALUE...

with a query like

SELECT FLATTENED (SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE FROM NODE_DISTRIBUTION WHERE VALUETYPE = ... ) FROM [MyModel].CONTENT WHERE NODE_TYPE ....

Thanks in advance

Regards,
Marco

View 3 Replies View Related

Logistic Regression Question

Feb 14, 2008

Hi All,

We're currently preparing for a project for a bank client of ours where we would be using SQL Server 2008's data mining capabilities.


Does anyone know if logistic regression supports the following types:


Binomial (standard)

Multinomial (standard)

Conditional

Ordered

Rank-ordered

Nested

Stereotype
Regards,
Joseph

View 1 Replies View Related

Power Regression

May 30, 2006

I need to write some SQL to do a power regression for a trendline. I have 2 columns of data which represent my X, Y data and all I'm after is the a and the b for the function y=ax^b. Has anyone ran into this before?? I know SSAS has a linear regression function but my data really only fits the power model.

View 4 Replies View Related

R*R - Excel Like Regression With SSAS?

Dec 13, 2007

Hi!
I try to make linear regression in multiple dimensions
with SSAS (y = a + a1*x1+ ... a2*xn)

I got the equation, but I also want to see R squared and R adjusted in same manner as in Excel.
How to achieve that?
Greetings

View 2 Replies View Related

DMX Query For Regression Coefficients

Oct 3, 2006

How do I write a DMX query to return the coefficients of the independent variables in my regression equation?

Thanks,

Carrie

View 10 Replies View Related

Regression Line In Chart

Feb 8, 2008

I would know if is possible to add the regression line to a scatter chart !!!

View 5 Replies View Related

Trendlines/Regression With RS Charts?

Apr 15, 2008

[using: Reporting Services 2005, SQL Server 2005, Analysis Services 2005]


Has anyone ever implemented dynamic trendlines with RS charts?

I have a requirement to create a web-based chart based on an existing Excel chart that the client is already using. This chart uses a trendline to forecast performance for 3 months out. I know in Excel it's as easy as right-click->add trendline.

Is there a similarly simple way to do this in Reporting Services?
Also, the data source for this is OLAP, so if any of you are MDX gurus, is there some regression function to plot all the parallel axis points?


thanks for any insight.
-michael

View 1 Replies View Related

Linear Regression For Column Values

Jul 24, 2006

This is a real challenge. I hope someone is smart enough to know howto do this.I have a tableTABLE1[Column 1- 2001][Column 2- 2002][Column 3- 2003][Column 4 - 2004][Column 5 - 2005][Column 6 - 2006][Column 7 - Slope][2001][2002][2003][2004][2005][2006] [Slope][1] [2] [3] [4] [5] [6] [1][1.2] [.9] [4] [5] [5.4] [6.2] [?]Slope is defined as "M" in the equation y=mx+bI need a way a finding the linear equation that best fits the points soI can have SQL calculate the slope.Are there any smart people around that would know how to do this?thanks

View 3 Replies View Related

How Linear Regression Choose His Regressor ?

Apr 22, 2007

I would like to understand the algorithm that the linear regression method uses to choose the regressors in the model from a list of possible regressors.

I think that it is different from the common methods used in statistics like stepwise, forward or backward.



Laura Lerner

View 8 Replies View Related

Questions About Regression Tree Model

Oct 21, 2007

I have two questions about the regression tree of Microsoft Decision Trees algorithm.

1. The mining legend window has a column named Histogram showing a bar for each coefficient. What does this bar mean?
2. Since each node of a regression tree corresponds to a linear regression, how can I find the regression coefficient of each node? I mean the coefficient that tells how good the regression is.

Any tip will be greatly appreciated.

View 1 Replies View Related

Probit Regression Plug-In Algorithm

Feb 6, 2008

Hello,

I need to develop a Probit Regression Plug-In Algorithm.
Does anyone know if the plug-in framework will reasonably handle a Probit Regression?
Is anyone aware of any code or materials, specific to a Probit Regression Plug-in, that would help me to do this?
I am also interested in applying the dprobit methodology found in Stata for infinitesimal changes in independent variables.
Has anyone been successful using Stata to implement an SSAS plug-in algorithm?

thank you,
Bill Littlewood

View 4 Replies View Related

Multivariate Regression Model Like In Excel

Oct 11, 2007

Hi there,
We need to determine the prediction formula coefficients using the multivariate regression formula as is available in Excel AnalysisTool pack [something like Y = Ax + Bz + C and find A, B, C]. It would be a very "simple" type of analysis that would run on a single table. There does not seem to be an easy built-in SQL function to perform this. However, reading on the web, Analysis Services might be used to do this task? Is there a good sample for a multivariate regression?

Actually, is this a proper approach given the relative simplicity of the calculation? Do we really need to go through the trouble of setting up an Analysis Service solution just for this task?

Thanks in advance

View 8 Replies View Related

Linear Regression With Nested Explanation Variable

Jan 22, 2007

We are trying to create a model of linear regression with nested table. We used the create mining model sintax as follow :

create mining model rate_plan3002_nested2

( CUST_cycle LONG KEY,

VOICE_CHARGES double CONTINUOUS predict,

DUR_PARTNER_GRP_1 double regressor CONTINUOUS ,

nested_taarif_time_3002 table

( CUST_cycle long CONTINUOUS,

TARIFF_TIME text key,

TARIFF_VOICE_DUR_ALL double regressor CONTINUOUS

)

) using microsoft_linear_regression

INSERT INTO MINING STRUCTURE [rate_plan3002_nested2_Structure]

(CUST_cycle ,

VOICE_CHARGES ,

DUR_PARTNER_GRP_1 ,

[nested_taarif_time_3002](SKIP,TARIFF_TIME ,TARIFF_VOICE_DUR_ALL)

)

SHAPE {

OPENQUERY([Cell],

'SELECT CUST_cycle ,

VOICE_CHARGES ,

DUR_PARTNER_GRP_1

FROM dbo.panel_anality_3002

order by CUST_cycle ')}

APPEND

({OPENQUERY([Cell],

'select CUST_cycle,

TARIFF_TIME,

CYCLE_DATE

from dbo.nested_taarif_time_3002

order by CUST_cycle,TARIFF_TIME')

}

relate CUST_cycle to CUST_cycle

) as nested_taarif_time_3002



The results we got are a model with intercept only. if we don't use the nested variable (the red line) we get a rigth model . (we had more variable ....)

Is there a way to do this regression correctly?

Thanks,

Dror

View 7 Replies View Related

Linear Regression And Tolking The Coefficient For Each Variabel?

Sep 2, 2007

When using linear regression in the SQL Server 2005 Business IntelIigence Studio I interpet the information below as follow: X has a standard deviation of +- 37.046. Is it possible to obtain the standard deviation of each coefficient in the regression expression?

View 1 Replies View Related

Multivariate Regression Differences Between Excel And SSAS

Oct 18, 2007

That solved the application problem

However, now we face a different challenge. Running the same data through the SSAS Linear Regression model and the Excel Regression [Data Analysis] tool we get different answers:







Intercept
-3.57537

x
0.242462

z
0.353668
SSAS:
Intercept -2.95188545928199
x 0.201587406861264

z 0.371940525462092

In Excel we set up the Regression analysis using the 95% confidence interval. Is there a concept for confidence interval for linear regression in SSAS?. Since we are doing this for a company that has been using Excel for years, I do not think such a difference in results will be accepted...

Is there anything else we can do to ensure the answers are close? Must we then have to work around and call these calculations from Excel?

View 3 Replies View Related

Specifying Bound For Coefficients In Linear Regression Model

Jan 18, 2008

Hi,

I am trying to create a model using microsoft Linear Regression algorithm. But I want to constrain the coefficient of the parameters to non-negative value. There is concept of bound in SAS where we can specify the range of the coefficient. Does any of the SSAS mining algorithms support restricting the coefficient value?

Thanks,
DMN

View 3 Replies View Related

How Does Linear Regression Handle Missing Values For Prediction And For Training?

Sep 18, 2006

Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:

x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?

My guess:

The resulting linear regression formula is in the form:

y = constant + coeff1 * (x1 - avg_x1) + coeff2 * (x2 - avg_x2)

where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).

I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.

So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.

Is this correct?



Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which

y: no NULL value

x1: no NULL value

x2: 70 records out of 100 records are NULL

Can someone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?

In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?

View 1 Replies View Related

How Does Linear Regression Handle Missing Values For Prediction And For Training?

Sep 18, 2006

Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:

x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?

My guess:

The resulting linear regression formula is in the form:

y = constant + coeff1 * (x1 - avg_x1) + coeff2 * (x2 - avg_x2)

where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).

I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.

So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.

Is this correct?



Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which

y: no NULL value

x1: no NULL value

x2: 70 records out of 100 records are NULL

Can soemone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?

In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?

View 3 Replies View Related

JDBC Driver 1.2 CTP Regression Bug - Fails To Call SPs Which Use Temp Tables

Aug 2, 2007

We are seeing a regression bug with the Microsoft JDBC driver 1.2 CTP.

Using this driver, we don't seem to be able to call stored procedures which return a result set, if those stored procedures use temporary tables internally.

The 1.2 CTP driver fails to access such stored procedures in both SQL Server 2000 and SQL Server 2005 databases.
The previous 1.1 driver, suceeds in both cases.

Here is a test case which demonstrates the problem (with IP addresses and logins omitted). The prDummy stored procedure being called is quite simple, and I've copied it below:




Code Snippet

public class MicrosoftJDBCDriverCallingStoredProceduresTest extends TestCase {

// CREATE PROCEDURE [dbo].[prDummy]
// AS
//
// CREATE TABLE #MyTempTable (
// someid BIGINT NOT NULL PRIMARY KEY,
// userid BIGINT,
// )
//
// SELECT 1 as TEST2, 2 as TEST2
// GO

public void testStoredProcedureViaDirectJDBC() {
Connection conn = null;
String driverInfo = "<unknown>";
String dbInfo = "<unknown>";
try {
// Set up driver & DB login...
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
String connectionUrl = "jdbc:sqlserver://xxx.xxx.xxx.xxx:1433";
Properties dbProps = new Properties();
dbProps.put("databaseName", "xxxxxx");
dbProps.put("user", "xxxxxx");
dbProps.put("password", "xxxxxx");
// Get a connection...
conn = DriverManager.getConnection(connectionUrl, dbProps);
driverInfo = conn.getMetaData().getDriverName() + " v" + conn.getMetaData().getDriverVersion();
dbInfo = conn.getMetaData().getDatabaseProductName() + " v" + conn.getMetaData().getDatabaseProductVersion();
// Perform the test...
CallableStatement cs = conn.prepareCall("{CALL prDummy()}");
cs.executeQuery();
// If the previous line executes okay, the test is passed...
System.out.println("Accessing "" + dbInfo + "" with driver "" + driverInfo + "" calls the stored procedure successfully.");
}
catch (Exception e) {
// Fail the unit test...
fail("Accessing "" + dbInfo + "" with driver "" + driverInfo + "" fails to call the stored procedure: " + e.getMessage());
}
finally {
// Close the connection...
try { if (conn != null) conn.close(); } catch (Exception ignore) { }
}
}
}
The output of this test under both drivers and accessing both databases is as follows:




Code Snippet

Accessing "Microsoft SQL Server v8.00.2039" with driver "Microsoft SQL Server 2005 JDBC Driver v1.1.1501.101" calls the stored procedure successfully.

Accessing "Microsoft SQL Server v9.00.3042" with driver "Microsoft SQL Server 2005 JDBC Driver v1.1.1501.101" calls the stored procedure successfully.


Accessing "Microsoft SQL Server v8.00.2039" with driver "Microsoft SQL Server 2005 JDBC Driver v1.2.2323.101" fails to call the stored procedure: The statement did not return a result set.

Accessing "Microsoft SQL Server v9.00.3042" with driver "Microsoft SQL Server 2005 JDBC Driver v1.2.2323.101" fails to call the stored procedure: The statement did not return a result set.

View 17 Replies View Related

Regression Testing A Stored Procedure That Produces Multiple Rowsets

Nov 1, 2006

How do I write a regression test for a stored proc that produces multiple rowsets via multipl e select queries? E.g.
CREATE PROCEDURE myProc AS
SELECT 'Some stuff', GETDATE()
SELECT 'Some more stuff'

For single-select procs, I can create a temp table and INSERT #temp EXEC myProc, then evaluate the contents of the table to verify correct behavior, but that doesn't work in this case.

View 1 Replies View Related

Mining Content Viewer For Linear Regression: Node Distribution Output

Dec 19, 2006

With the number of threads it is difficult to know if this has been posted. If I use the Mining Content Viewer for Linear Regression, under Node Distribution, there are values given for Attribute Name, Attribute Value, Support, Probability, Variance, and Value Type. The output is similar to what Joris supplied in his thread about Predict Probability in Decision Trees. My questions:

1. How should these fields be interpreted?

2. With Linear Regression, is it possible to get the coefficient values and tests of significance (t-tests?), if they are not part of the output I have pointed to?

Thanks for your help with this?

Sam

View 1 Replies View Related

Transact SQL :: Ensure Code Non Regression By Keeping Consistent Signature For Procedure / Views And Function

Jul 28, 2015

In the 70-461 objectives it says: Ensure code non regression by keeping consistent signature for procedure, views and function (interfaces); security implications...I think I understand what this means in general. They want us to be able to create a view that will still be able to call the original data even if the table is modified.  In other words, the view table shouldn't easily be broken. ie, type a code that does NOT ensure non regression, then change the code so that it does ensure non regression. 

View 4 Replies View Related

How Is The 'Score' Value Derived In The Lift Chart/Mining Legend For Data Mining Models?

Sep 26, 2006

Hi,
I have just run a simple data set through a model to predict a simple true or false value (i.e. binary output)
The Lift Chart/Mining Legend in Analysis Services shows three results €“ Score, Population Correct (%), and Predict Probability (%)

Population Correct I beleive is the percentage of predictions it got right out of the total number of predictions it tried to make. Is this correct?

However, I can€™t work out how the other two are derived in particular the 'SCORE'. To give a live example the scores were as follows:

Model Score Pop Correct Pred Probability
Decision Trees 0.83 76.59% 54.28%
Neural Network 0.75 67.63% 50.05%
Ideal Model 100.00%


Can anyone help with this and give a detailed explanation?

Many thanks,
S Rajput

View 4 Replies View Related

The Balanced Scorecard And Data Mining

Mar 12, 2007

Do you think that data mining has a role within the balanced scorecard? If yes, what kind of models can you think of?

View 1 Replies View Related

Business Scorecard Integration With SQL Server 2000/2005

Nov 15, 2005

I was trying to install Business Scorecard Manager Server , for that i have installed SQL server 2000, Analysis services and Service Pack 4 and also have taken mixed mode of authentication . Then i have installed sharepoint portal server 2003 and extended a site .

View 3 Replies View Related

Multiple Counts And A Score

Feb 17, 2014

I have a table called enablers , with the following data

title Raiser Assignedto
book Fred John
Apple Peter Peter
Orange Bill Roger
Cup John Fred

For each time a users name appears in the raiser column they get 1 point, for each time a users name appears in the Assignedto column they get 1 point , but if their name appears in both Raiser and Assignedto for a particular row they only get 1 point not 2 points, I then need a count of raiser points plus a count of assignedto points to give a total points score ( raised plus assignedto)..I am looking how to get the output like below

Name Total Points
Fred 2
Peter 1
Bill 1
John 2
Roger 1

View 8 Replies View Related

GROUP BY Highest Score Per User

Mar 4, 2005

Hi there!

I've got a SPROC that generates a recordset of user vote tallies (they're calculated in a separated SPROC). The user submissions are grouped by a GUID value so as to remain unique for a user's submission (each user can have multiple submissions.

The problem is that the recordset returned displays ALL the users, and I'd like to only select the highest score for each user. So, if I have 500 submissions from 3 users (User1 and User2 submit once each and User3 submits 497 times), the total recordset will have 3 rows - being the highest score per user, discounting the others.

Here's my base query:

SELECT a.UserID,a.Name AS [Name],SUM(b.TotalTally) AS [TotalPoints]
FROM Users a
INNER JOIN Ballots b ON a.UserID = b.UserID
GROUP BY a.UserID, a.Name,b.SubmissionGUID
ORDER BY [TotalPoints] DESC,[Name] ASC

...and I've been able to get the highest vote per user, discounting duplicate entries, by using this:

SELECT a.UserID,MAX(b.TotalTally) AS [TotalPoints]
FROM Users a
INNER JOIN Ballots b ON a.UserID = b.UserID
GROUP BY a.UserID

How can I write combine the two in a nested subquery to display only the top score per user?

View 1 Replies View Related

T-SQL (SS2K8) :: Give Point And Set The Score

Aug 12, 2014

I've table and data as following,

CREATE TABLE [dbo].[x_SCORE](
[idx] [int] IDENTITY(-2147483648,1) NOT NULL,
[CVID] [int] NOT NULL,
[myGender] [char](1) NULL,
[whatGender] [char](1) NULL,
[point_Gender] [tinyint] NULL,

[Code] ....

So, my table and data must be as following,

This is the calculation for CVID=1449

1- myGender=M
2- whatGender=M
3- point_Gender=4
4- So, score_Gender=4

5- myBMI=23.53
6- min_BMI=20.22
7- max_BMI=30.00
8- point_BMI=3
9- myBMI is between 20.22 and 30.00
10- So, score_BMI=3

This is the calculation for CVID=1925

1- myGender=F
2- whatGender=M
3- point_Gender=4
4- So, score_Gender=0

5- myBMI=35.43
6- min_BMI=20.22
7- max_BMI=30.00
8- point_BMI=3
9- myBMI IS NOT between 20.22 and 30.00
10- So, score_BMI=0

After calculation, my data should be as following :

The variant for each row IS SAME. See as following :

View 1 Replies View Related

String Comparison Using Score Method

Apr 4, 2006

Dear All:I encounter one problem when I want to implement my thought. My thoughtis that user want to search a record of someone but maybe user wouldtype wrong name or spell name wrong. I wish to compare the string whichuser inputed to the database column using "Socre Method". "ScoreMethod" has a variable "grade" to accumulate the score. I want toconvert the string to char array, and compare the char one by one. Ifthe string is more accurate , the grade is more high. At last, I choosethe most higher score record to show. How to do this thought with tsql?Could give me some tips or guide to learn? I will appreciate yourkindness, thanks.

View 6 Replies View Related

Keep Duplicate With Highest Score Fuzzy Grouping

Jan 22, 2012

I have recently decided to dedupe my data but i am having a problem after running fuzzy grouping with the query on updating which duplicate to keep

_key_in is unique, _key_out is the duplicates so for example:

_key_in , _key_out , name , score , dedupe
1 , 1 , ron , 10 , purge
2 , 1 , ronn , 15 , keep
3 , 3 , john , 5 , keep
4 , 4 , matt , 15 , keep
5 , 4 , mat , 10 , purge
6 , 4 , matt , 15 , purge

I want to keep the _key_out with the higher score by setting the field de_dupe to 'keep' and the remainder to 'purge'. The score can also be the same within a duplicate so in the case it is the same i just need to keep one it doesnt matter which one. The query i have below nearly works but it marks duplicates with the same score as keep.

Code:
UPDATE b
SET b.dedupe_result = 'keep'
FROM
[BusinessListings].[dbo].[MongoOrganisationACTM1Destination] b
INNER JOIN

[Code] ....

View 2 Replies View Related

How To Install And Register The Office Business Scorecard Custom Data Processing Extension With Report Server?

Apr 25, 2007

Hi , all Microsoft BI experts here,

Thanks for your kind attention.

I am having a question as stated in the subject title, yes, when we want to deploy scorecards to reporting services, as the prerequisite, how can we install and register the scorecard custom data processing extension with the Microsoft reporting services server?

I am looking forward to hearing from you shortly and thank you again.

With best regards,

Yours sincerely,

View 1 Replies View Related

T-SQL (SS2K8) :: Table With Score Info For Groups - Ranking For Current And Previous Week

Jan 21, 2015

I have a table with score info for each group, and the table also contains historical data, I need to get the ranking for the current week and previous week, here is what I did and the result is apparently wrong:

select CurRank = row_number() OVER (ORDER BY cr.CurScore desc) , cr.group_name,cr.CurScore
, lastWeek.PreRank, lastWeek.group_name,lastWeek.PreScore
from
(select group_name,
Avg(case when datediff(day, asAtDate, getdate()) <= 7 then sumscore else 0 end) as CurScore

[Code] ....

The query consists two parts: from current week and previous week respectively. Each part returns correct result, the final merged result is wrong.

View 3 Replies View Related







Copyrights 2005-15 www.BigResource.com, All rights reserved