[R] Genmod in SAS vs. glm in R
Tobias Verbeke
tobias.verbeke at gmail.com
Wed Sep 10 12:51:44 CEST 2008
Ajay ohri wrote:
> Whats the R equivalent for Proc logistic in SAS ?
glm with the appropriate family (binomial) and link, I guess.
There is a book 'R for SAS and SPSS users' forthcoming
http://www.springer.com/statistics/computational/book/978-0-387-09417-5
> Is there a stepwise
> method there ?
See
library(MASS)
?stepAIC
for an example; the following might provide a useful read
on stepwise methods:
http://www.pitt.edu/~wpilib/statfaq/regrfaq.html
> How to create scoring models in R , for larger datasets (200 mb), Is
> there a way to compress and use datasets (like options compress=yes;)
Fit the model using glm and 'score' using the predict method.
200 Mb isn't that large anymore, but see Thomas Lumley's biglm
package for a bounded-memory version if you're working on
limited hardware.
HTH,
Tobias
> On Wed, Sep 10, 2008 at 11:12 AM, Peter Dalgaard
> <p.dalgaard at biostat.ku.dk> wrote:
>> Rolf Turner wrote:
>>> For one thing your call to glm() is wrong --- didn't you notice the
>>> warning messages about ``non-integer #successes in a binomial glm!''?
>>>
>>> You need to do either:
>>>
>>> glm(r/k ~ x, family=binomial(link='cloglog'), data=bin_data,
>>> offset=log(y), weights=k)
>>>
>>> or:
>>>
>>> glm(cbind(r,k-r) ~ x, family=binomial(link='cloglog'), data=bin_data,
>>> offset=log(y))
>>>
>>> You get the same answer with either, but this answer still does not agree
>>> with your
>>> SAS results. Perhaps you have an error in your SAS syntax as well. I
>>> wouldn't know.
>> The data created in the data step are not those used in the analysis.
>> Changing to
>>
>> data nelson;
>> <etc>
>>
>> gives the same result as R on the versions I have available:
>>
>> Analysis Of Parameter
>> Estimates
>>
>> Standard Wald 95%
>> Confidence Chi-
>> Parameter DF Estimate Error Limits
>> Square Pr > ChiSq
>>
>> Intercept 1 -3.5866 2.2413 -7.9795
>> 0.8064 2.56 0.1096
>> x 1 0.9544 2.8362 -4.6046
>> 6.5133 0.11 0.7365
>> Scale 0 1.0000 0.0000 1.0000
>> 1.0000
>>
>> and
>> Call:
>> glm(formula = r/k ~ x, family = binomial(link = "cloglog"), data = bin_data,
>> weights = k, offset = log(y))
>>
>> Deviance Residuals: 1 2 3 4 0.5407 -0.9448
>> -1.0727 0.7585
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -3.5866 2.2413 -1.600 0.110
>> x 0.9544 2.8362 0.336 0.736
>>
>>
>>> cheers,
>>>
>>> Rolf Turner
>>>
>>> On 10/09/2008, at 10:37 AM, sandsky wrote:
>>>
>>>> Hello,
>>>>
>>>> I have different results from these two softwares for a simple binomial
>>>> GLM
>>>> problem.
>>>>> From Genmod in SAS: LogLikelihood=-4.75, coeff(intercept)=-3.59,
>>>> coeff(x)=0.95
>>>>> From glm in R: LogLikelihood=-0.94, coeff(intercept)=-3.99,
>>>>> coeff(x)=1.36
>>>> Is there anyone tell me what I did wrong?
>>>>
>>>> Here are the code and results,
>>>>
>>>> 1) SAS Genmod:
>>>>
>>>> % r: # of failure
>>>> % k: size of a risk set
>>>>
>>>> data bin_data;
>>>> input r k y x;
>>>> os=log(y);
>>>> cards;
>>>> 1 3 5 0.5
>>>> 0 2 5 0.5
>>>> 0 2 4 1.0
>>>> 1 2 4 1.0
>>>> ;
>>>> proc genmod data=nelson;
>>>> model r/k = x / dist = binomial link =cloglog offset = os ;
>>>>
>>>> <Results from SAS>
>>>>
>>>> Log Likelihood -4.7514
>>>>
>>>> Parameter DF Estimate Error Limits
>>>> Square Pr > ChiSq
>>>>
>>>> Intercept 1 -3.6652 1.9875 -7.5605 0.2302
>>>> 3.40 0.0652
>>>> x 1 0.8926 2.4900 -3.9877 5.7728
>>>> 0.13 0.7200
>>>> Scale 0 1.0000 0.0000 1.0000 1.0000
>>>>
>>>>
>>>>
>>>> 2) glm in R
>>>>
>>>> bin_data <-
>>>>
>>>> data.frame(cbind(y=c(5,5,4,4),r=c(1,0,0,1),k=c(3,2,2,2),x=c(0.5,0.5,1.0,1.0)))
>>>> glm(r/k ~ x, family=binomial(link='cloglog'), data=bin_data,
>>>> offset=log(y))
>>>>
>>>> <Results from R>
>>>> Coefficients:
>>>> (Intercept) x
>>>> -3.991 1.358
>>>>
>>>> 'log Lik.' -0.9400073 (df=2)
>>> ######################################################################
>>> Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
More information about the R-help
mailing list