[R] wilcox.test loop through variable names

Jacob Wegelin jacobwegelin at fastmail.fm
Sun Nov 15 20:33:09 CET 2009


Often I perform the same task on a series of variables in a dataframe,
by looping through a character vector that holds the names and using
paste(), eval(), and parse() inside the loop.

For instance:

rm(environmental)
thesevars<-names(environmental)
environmental$ToyReal <-rnorm(nrow(environmental)) 
environmental$ToyDichot<- environmental$ToyReal < 0.53

tableOfResults<-data.frame(var=thesevars)

tableOfResults$p_wilcox <- NA

tableOfResults$Beta_lm <- NA

rownames(tableOfResults)<-thesevars

for( thisvar in thesevars) {
  	thiscommand<- paste("thiswilcox <- wilcox.test (", thisvar, " ~ ToyDichot , data=environmental)")
 	eval(parse(text=thiscommand))
  	tableOfResults[thisvar, "p_wilcox"] <- thiswilcox$p.value
 	thislm<-lm( environmental[ c( "ToyReal", thisvar )])
  	tableOfResults[thisvar, "Beta_lm"] <- coef(thislm)[thisvar]
}

print(tableOfResults)

Of course, the loop above is a toy example. In real life I might first figure out whether the variable is
continuous, dichotomous, or categorical taking on several values, then perform an operation depending on
its type.

The use of paste(), eval(), and parse() seems awkward.  As Gabor Grothendieck showed
(http://tolstoy.newcastle.edu.au/R/e8/help/09/11/4520.html), if we
are calling a regression function such as lm() we can avoid using
paste(), as shown above.

But is there a way to avoid paste() and eval() when one uses t.test()
or wilcox.test()?

Thanks

Jacob A. Wegelin
Department of Biostatistics
Virginia Commonwealth University
Richmond VA 23298-0032
U.S.A.




More information about the R-help mailing list