[R] Genmod in SAS vs. glm in R

Wed Sep 10 12:51:44 CEST 2008

Ajay ohri wrote:

> Whats the R equivalent for Proc logistic in SAS ? 

glm with the appropriate family (binomial) and link, I guess.

There is a book 'R for SAS and SPSS users' forthcoming

http://www.springer.com/statistics/computational/book/978-0-387-09417-5

> Is there a stepwise
> method there ?

See

library(MASS)
?stepAIC

for an example; the following might provide a useful read
on stepwise methods:

http://www.pitt.edu/~wpilib/statfaq/regrfaq.html

> How to create scoring models in R , for larger datasets (200 mb), Is
> there a way to compress and use datasets (like options compress=yes;)

Fit the model using glm and 'score' using the predict method.
200 Mb isn't that large anymore, but see Thomas Lumley's biglm
package for a bounded-memory version if you're working on
limited hardware.

HTH,
Tobias

> On Wed, Sep 10, 2008 at 11:12 AM, Peter Dalgaard
> <p.dalgaard at biostat.ku.dk> wrote:
>> Rolf Turner wrote:
>>> For one thing your call to glm() is wrong --- didn't you notice the
>>> warning messages about ``non-integer #successes in a binomial glm!''?
>>>
>>> You need to do either:
>>>
>>> glm(r/k ~ x, family=binomial(link='cloglog'), data=bin_data,
>>> offset=log(y), weights=k)
>>>
>>> or:
>>>
>>> glm(cbind(r,k-r) ~ x, family=binomial(link='cloglog'), data=bin_data,
>>> offset=log(y))
>>>
>>> You get the same answer with either, but this answer still does not agree
>>> with your
>>> SAS results.  Perhaps you have an error in your SAS syntax as well.  I
>>> wouldn't know.
>> The data created in the data step are not those used in the analysis.
>> Changing to
>>
>> data nelson;
>> <etc>
>>
>> gives the same result as  R on the versions I have available:
>>
>>                                                 Analysis Of Parameter
>> Estimates
>>
>>                                                    Standard     Wald 95%
>> Confidence       Chi-
>>                     Parameter    DF    Estimate       Error           Limits
>>            Square    Pr > ChiSq
>>
>>                     Intercept     1     -3.5866      2.2413     -7.9795
>>  0.8064       2.56        0.1096
>>                     x             1      0.9544      2.8362     -4.6046
>>  6.5133       0.11        0.7365
>>                     Scale         0      1.0000      0.0000      1.0000
>>  1.0000
>>
>> and
>> Call:
>> glm(formula = r/k ~ x, family = binomial(link = "cloglog"), data = bin_data,
>>   weights = k, offset = log(y))
>>
>> Deviance Residuals:     1        2        3        4  0.5407  -0.9448
>>  -1.0727   0.7585
>> Coefficients:
>>           Estimate Std. Error z value Pr(>|z|)
>> (Intercept)  -3.5866     2.2413  -1.600    0.110
>> x             0.9544     2.8362   0.336    0.736
>>
>>
>>>    cheers,
>>>
>>>        Rolf Turner
>>>
>>>    On 10/09/2008, at 10:37 AM, sandsky wrote:
>>>
>>>> Hello,
>>>>
>>>> I have different results from these two softwares for a simple binomial
>>>> GLM
>>>> problem.
>>>>> From Genmod in SAS: LogLikelihood=-4.75, coeff(intercept)=-3.59,
>>>> coeff(x)=0.95
>>>>> From glm in R: LogLikelihood=-0.94, coeff(intercept)=-3.99,
>>>>> coeff(x)=1.36
>>>> Is there anyone tell me what I did wrong?
>>>>
>>>> Here are the code and results,
>>>>
>>>> 1) SAS Genmod:
>>>>
>>>> % r: # of failure
>>>> % k: size of a risk set
>>>>
>>>> data bin_data;
>>>> input r k y x;
>>>> os=log(y);
>>>> cards;
>>>> 1    3    5    0.5
>>>> 0    2    5    0.5
>>>> 0    2    4    1.0
>>>> 1    2    4    1.0
>>>> ;
>>>> proc genmod data=nelson;
>>>>    model r/k = x /     dist = binomial     link =cloglog   offset = os ;
>>>>
>>>>     <Results from SAS>
>>>>
>>>>    Log Likelihood                       -4.7514
>>>>
>>>>    Parameter    DF    Estimate       Error           Limits
>>>> Square    Pr > ChiSq
>>>>
>>>>    Intercept     1     -3.6652      1.9875     -7.5605      0.2302
>>>> 3.40        0.0652
>>>>    x                1      0.8926      2.4900     -3.9877      5.7728
>>>> 0.13        0.7200
>>>>    Scale          0      1.0000      0.0000      1.0000      1.0000
>>>>
>>>>
>>>>
>>>> 2) glm in R
>>>>
>>>> bin_data <-
>>>>
>>>> data.frame(cbind(y=c(5,5,4,4),r=c(1,0,0,1),k=c(3,2,2,2),x=c(0.5,0.5,1.0,1.0)))
>>>> glm(r/k ~ x, family=binomial(link='cloglog'), data=bin_data,
>>>> offset=log(y))
>>>>
>>>>     <Results from R>
>>>>    Coefficients:
>>>>    (Intercept)            x
>>>>        -3.991        1.358
>>>>
>>>>    'log Lik.' -0.9400073 (df=2)
>>> ######################################################################
>>> Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>>  O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
>