[R] ols function in rms package

Mark Seeto mark.seeto at nal.gov.au
Tue Jun 8 21:25:41 CEST 2010


> On 06/08/2010 05:29 AM, Mark Seeto wrote:
>>
>>> On 06/06/2010 10:49 PM, Mark Seeto wrote:
>>>> Hello,
>>>>
>>>> I have a couple of questions about the ols function in Frank Harrell's
>>>> rms
>>>> package.
>>>>
>>>> Is there any way to specify variables by their column number in the
>>>> data
>>>> frame rather than by the variable name?
>>>>
>>>> For example,
>>>>
>>>> library(rms)
>>>> x1<- rnorm(100, 0, 1)
>>>> x2<- rnorm(100, 0, 1)
>>>> x3<- rnorm(100, 0, 1)
>>>> y<- x2 + x3 + rnorm(100, 0, 5)
>>>> d<- data.frame(x1, x2, x3, y)
>>>> rm(x1, x2, x3, y)
>>>> lm(y ~ d[,2] + d[,3], data = d)  # This works
>>>> ols(y ~ d[,2] + d[,3], data = d) # Gives error
>>>> Error in if (!length(fname) || !any(fname == zname)) { :
>>>>     missing value where TRUE/FALSE needed
>>>>
>>>> However, this works:
>>>> ols(y ~ x2 + d[,3], data = d)
>>>>
>>>> The reason I want to do this is to program variable selection for
>>>> bootstrap model validation.
>>>>
>>>> A related question: does ols allow "y ~ ." notation?
>>>>
>>>> lm(y ~ ., data = d[, 2:4])  # This works
>>>> ols(y ~ ., data = d[, 2:4]) # Gives error
>>>> Error in terms.formula(formula) : '.' in formula and no 'data'
>>>> argument
>>>>
>>>> Thanks for any help you can give.
>>>>
>>>> Regards,
>>>> Mark
>>>
>>> Hi Mark,
>>>
>>> It appears that you answered the questions yourself.  rms wants real
>>> variables or transformations of them.  It makes certain assumptions
>>> about names of terms.   The y ~ . should work though; sometime I'll
>>> have
>>> a look at that.
>>>
>>> But these are the small questions compared to what you really want.
>>> Why
>>> do you need variable selection, i.e., what is wrong with having
>>> insignificant variables in a model?  If you indeed need variable
>>> selection see if backwards stepdown works for you.  It is built-in to
>>> rms bootstrap validation and calibration functions.
>>>
>>> Frank
>>>
>>
>> Thank you for your reply, Frank. I would have reached the conclusion
>> that rms only accepts real variables had this not worked:
>> ols(y ~ x2 + d[,3], data = d)
>
> Hi Mark - that probably worked by accident.
>
>>
>> The reason I want to program variable selection is so that I can use the
>> bootstrap to check the performance of a model-selection method. My
>> co-workers and I have used a variable selection method which combines
>> forward selection, backward elimination, and best subsets (the forward
>> and
>> backward methods were run using different software).
>>
>> I want to do bootstrap validation to (1) check the over-optimism in R^2,
>> and (2) justify using a different approach, if R^2 turns out to be very
>> over-optimistic. The different approach would probably be data reduction
>> using variable clustering, as you describe in your book.
>

Again, thanks for your reply Frank.

> The validate.ols function which calls the predab.resample function may
> give you some code to start with.  Note however that the performance of
> the approach you are suggestion has already been shown to be poor in
> many cases.

I think I've worked out how to program forward selection, and backward
elimination should be no more difficult. Other reasons for doing this are
(1) while I believe your criticism of stepwise variable selection methods,
it would be good to verify their poor performance for myself, and (2) I've
done very little programming in R, so it's good practice.

>You might run the following in parallel: full model fits
> and penalized least squares using penalties selected by AIC (using
> special arguments to ols along with the pentrace function).
>
> Frank

Thanks for the suggestion; I'll try this.

Regards,
Mark
-- 
Mark Seeto
Statistician

National Acoustic Laboratories <http://www.nal.gov.au/>
A Division of Australian Hearing



More information about the R-help mailing list