[R] Logistic Regression in R (SAS -like output)
Frank Harrell
f.harrell at vanderbilt.edu
Mon Aug 9 23:34:19 CEST 2010
In the trivial case where all candidate predictors have one degree of
freedom (which is unlikely as some things will be nonlinear or have >
2 categories), adding a variable if it increases AIC is the same as
adding it if its chi-square exceeds 2. This corresponds to an alpha
level of 0.157 for a chi-square with 1 d.f. At least AIC leads
people to use a more realistic alpha (small alpha in stepwise
regression leads to more bias in the retained regression
coefficients). But you still have serious multiplicity problems, and
non-replicable models.
Things are different if you have a pre-defined group of variables you
are thinking of adding. Suppose that this group of 10 variables
required 15 d.f. Adding the group if AIC (based on 15 d.f.)
increases wouldn't be a bad strategy. This avoids the multiplicities
of single-variable "looks".
Frank
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University
On Mon, 9 Aug 2010, Kingsford Jones wrote:
> On Mon, Aug 9, 2010 at 10:27 AM, Frank Harrell <f.harrell at vanderbilt.edu> wrote:
>>
>> Note that stepwise variale selection based on AIC has all the problems of
>> stepwise variable selection based on P-values. AIC is just a restatement of
>> the P-Value.
>
> I find the above statement very interesting, particularly because
> there are common misconceptions in the ecological community that AIC
> is a panacea for model selection problems and the theory behind
> P-values is deeply flawed. Can you direct me toward a reference for
> better understanding the relation?
>
> best,
>
> Kingsford Jones
>
>
>>
>> Frank
>>
>> Frank E Harrell Jr Professor and Chairman School of Medicine
>> Department of Biostatistics Vanderbilt University
>>
>> On Mon, 9 Aug 2010, Gabor Grothendieck wrote:
>>
>>> On Mon, Aug 9, 2010 at 6:43 AM, Harsh <singhalblr at gmail.com> wrote:
>>>>
>>>> Hello useRs,
>>>>
>>>> I have a problem at hand which I'd think is fairly common amongst
>>>> groups were R is being adopted for Analytics in place of SAS.
>>>> Users would like to obtain results for logistic regression in R that
>>>> they have become accustomed to in SAS.
>>>>
>>>> Towards this end, I was able to propose the Design package in R which
>>>> contains many functions to extract the various metrics that SAS
>>>> reports.
>>>>
>>>> If you have suggestions pertaining to other packages, or sample code
>>>> that replicates some of the SAS outputs for logistic regression, I
>>>> would be glad to hear of them.
>>>>
>>>> Some of the requirements are:
>>>> - Stepwise variable selection for logistic regression
>>>> - Choose base level for factor variables
>>>> - The Hosmer-Lemeshow statistic
>>>> - concordant and discordant
>>>> - Tau C statistic
>>>>
>>>
>>> For stepwise logistic regression using AIC see:
>>>
>>> library(MASS)
>>> ?stepAIC
>>>
>>> For specifying reference level:
>>>
>>> ?relevel
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
More information about the R-help
mailing list