[R] formula and model.frame

Doran, Harold HDoran at air.org
Wed Oct 21 17:55:31 CEST 2009

Suppose I have the following function

myFun <- function(formula, data){
	f <- formula(formula)
	dat <- model.frame(f, data)

Applying it with this sample data yields a new dataframe:

qqq <- data.frame(grade = c(3, NA, 3,4,5,5,4,3), score = rnorm(8), idVar = c(1:8))

dat <- myFun(score ~ grade, qqq)

However, what I would like is for the resulting dataframe (dat) to include as a column idVar. Naively, I could do

dat <- myFun(score ~ grade + idVar, qqq)

This gives what I'm after in terms of the resulting data, but doesn't make sense within the context of the model I am working on since idVar is not one of the conditioning variables used, it has a different purpose altogether. So, I was thinking another way is to attach it somehow afterwards. Something like:

myFun <- function(formula, data, id, na.action){
	f <- formula(formula)
	idVar <- data[, id]
	dat <- model.frame(f, data, na.action = na.action)
	dat[, id] <- idVar

myFun(score ~ grade, qqq, id = 'idVar', na.action = NULL)

Of course, I intentionally use na.action = NULL here because the following occurs, of course

> myFun(score ~ grade, qqq, id = 'idVar', na.action = na.omit)
Error in `[<-.data.frame`(`*tmp*`, , id, value = 1:8) : 
  replacement has 8 rows, data has 7

I see a few workarounds, but I am certain there is a cleaner way to accomplish this. 


> sessionInfo()
R version 2.9.0 (2009-04-17) 

LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lme4_0.999375-28   Matrix_0.999375-25 lattice_0.17-22    xtable_1.5-5       adapt_1.0-4        MiscPsycho_1.4    
[7] statmod_1.3.8     

loaded via a namespace (and not attached):
[1] grid_2.9.0  tools_2.9.0

More information about the R-help mailing list