[R] extract minimal variables from model

Marc Schwartz marc_schwartz at me.com
Fri Jan 6 18:17:36 CET 2017


> On Jan 6, 2017, at 11:03 AM, Jacob Wegelin <jacobwegelin at fastmail.fm> wrote:
> 
> Given any regression model, created for instance by lm, lme, lmer, or rqs, such as
> 
> z1<-lm(weight~poly(Time,2), data=ChickWeight)
> 
> I would like a general way to obtain only those variables used for the model.  In the current example, this "minimal data frame" would consist of the "weight" and "Time" variables and none of the other columns of ChickWeight.
> 
> (Motivation: Sometimes the data frame contains thousands of variables which are not used in the current regression, and I do not want to keep copying and propagating them.)
> 
> The "model" component of the regression object doesn't serve this purpose:
> 
>> head(z1$model)
>  weight poly(Time, 2).1 poly(Time, 2).2
> 1     42    -0.066020938     0.072002235
> 2     51    -0.053701293     0.031099018
> 3     59    -0.041381647    -0.001334588
> 4     64    -0.029062001    -0.025298582
> 5     76    -0.016742356    -0.040792965
> 6     93    -0.004422710    -0.047817737
> 
> The following awkward workaround seems to do it when variable names contain only "word characters" as defined by regex:
> 
> minimalvariablesfrommodel20161120 <-function(object, originaldata){
> # stopifnot(!missing(originaldata))
> stopifnot(!missing(object))
> intersect(
> 	unique(unlist(strsplit(format(object$call$formula), split="\\W", perl=TRUE)))
> 	, names(originaldata)
> 	)
> }
> 
>> minimalvariablesfrommodel20161120(z1, ChickWeight)
> [1] "weight" "Time" 
>> 
> 
> But if a variable has a space in its name, my workaround fails:
> 
>> ChickWeight$"dog tail"<-ChickWeight$Time
>> z1<-lm(weight~poly(`dog tail`,2), data=ChickWeight)
>> head(z1$model)
>  weight poly(`dog tail`, 2).1 poly(`dog tail`, 2).2
> 1     42          -0.066020938           0.072002235
> 2     51          -0.053701293           0.031099018
> 3     59          -0.041381647          -0.001334588
> 4     64          -0.029062001          -0.025298582
> 5     76          -0.016742356          -0.040792965
> 6     93          -0.004422710          -0.047817737
>> minimalvariablesfrommodel20161120(z1, ChickWeight)
> [1] "weight"
>> 
> 
> Is there a more elegant, and hence more reliable, approach?
> 
> Thanks
> 
> Jacob A. Wegelin


Jacob,

In general, if you have a model object 'm', you can use the following syntax:

  all.vars(terms(m))

See ?terms and ?all.vars, the latter also includes all.names().

Regards,

Marc Schwartz



More information about the R-help mailing list