[R] Consistency of Logistic Regression

Uwe Ligges ligges at statistik.tu-dortmund.de
Sat Nov 13 16:46:00 CET 2010



On 12.11.2010 20:11, Marc Schwartz wrote:
> You are not creating your data set properly.
>
> Your 'mat' is:
>
>> mat
>     column1 column2
> 1        1       0
> 2        1       0
> 3        0       1
> 4        0       0
> 5        1       1
> 6        1       0
> 7        1       0
> 8        0       1
> 9        0       0
> 10       1       1
>
>
> What you really want is:
>
> DF<- data.frame(y = c(1,0,1,0,0,1,0,0,1,1), x = c(5,4,1,6,3,6,5,3,7,9))


Actually it is in general safer to have a factor y rather than numeric y 
for classification tasks.

Best,
Uwe


>> DF
>     y x
> 1  1 5
> 2  0 4
> 3  1 1
> 4  0 6
> 5  0 3
> 6  1 6
> 7  0 5
> 8  0 3
> 9  1 7
> 10 1 9
>
>
>
> MOD<- glm(y ~ x, data = DF, family = binomial)
>
>
>> summary(MOD)
>
> Call:
> glm(formula = y ~ x, family = binomial, data = DF)
>
> Deviance Residuals:
>      Min       1Q   Median       3Q      Max
> -1.3353  -1.0229  -0.1239   0.9956   1.7477
>
> Coefficients:
>              Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -1.6118     1.7833  -0.904    0.366
> x             0.3293     0.3383   0.973    0.330
>
> (Dispersion parameter for binomial family taken to be 1)
>
>      Null deviance: 13.863  on 9  degrees of freedom
> Residual deviance: 12.767  on 8  degrees of freedom
> AIC: 16.767
>
> Number of Fisher Scoring iterations: 4
>
>
> HTH,
>
> Marc Schwartz
>
>
> On Nov 12, 2010, at 12:56 PM, Benjamin Godlove wrote:
>
>> I think it is likely I am missing something.  Here is a very simple example:
>>
>> R code:
>>
>> mat<- matrix(nrow = 10, ncol = 2, c(1,0,1,0,0,1,0,0,1,1),
>> c(5,4,1,6,3,6,5,3,7,9), dimnames = list(c(1,2,3,4,5,6,7,8,9,10),
>> c("column1","column2")))
>>
>> g<- glm(mat[1:10] ~ mat[11:20], family = binomial (link = logit))
>>
>> g$converged
>>
>>
>> SAS code:
>>
>> data mat;
>> input col1 col2;
>> datalines;
>> 1 5
>> 0 4
>> 1 1
>> 0 6
>> 0 3
>> 1 6
>> 0 5
>> 0 3
>> 1 7
>> 1 9
>> ;
>>
>> proc logistic data=mat descending;
>> model col1 = col2 / link=logit;
>> run;
>>
>> SAS output (in case you don't have access to SAS):
>> Convergence criterion satisfied
>>
>>                   Estimate       SE
>> Intercept    -1.6118          1.7833
>> col2            0.3293          0.3383
>>
>>
>> Of course, with an example this small, it is not so surprising that the two
>> methods differ; and they hardly differ by a single S.  But as the datasets
>> get larger, the difference is more pronounced.  Let me know if you would
>> like me to send you a large dataset.  I get the feeling I am doing something
>> wrong in R, so please let me know what you think.
>>
>> Thank you!
>>
>> Ben Godlove
>>
>> On Thu, Nov 11, 2010 at 1:59 PM, Albyn Jones<jones at reed.edu>  wrote:
>>
>>> do you have factors (categorical variables) in the model?  it could be
>>> just a parameterization difference.
>>>
>>> albyn
>>>
>>> On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:
>>>> Dear R developers,
>>>>
>>>> I have noticed a discrepancy between the coefficients returned by R's
>>> glm()
>>>> for logistic regression and SAS's PROC LOGISTIC.  I am using dist =
>>> binomial
>>>> and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS
>>> uses
>>>> Fisher's scoring, but the difference is something like 100 SE on the
>>>> intercept.  What accounts for such a huge difference?
>>>>
>>>> Thank you for your time.
>>>>
>>>> Ben Godlove
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> --
>>> Albyn Jones
>>> Reed College
>>> jones at reed.edu
>>>
>>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list