Variations on the t test
Peter Dalgaard BSA
p.dalgaard@biostat.ku.dk
26 Aug 1998 00:01:45 +0200
One of the things that have been annoying me with both Splus and R is
that the "simple tests" are inconsistent with the lm/glm/dataframe
conventions, and that they become quite awkward to use when data have
to be extracted from a dataframe:
t.test(data$bp[data$sex=="F" & data$age>25], data$bp[data$sex=="M" &
data$age>25])
OK, so it's better to do
eval(t.test(bp[sex=="F"],bp[sex=="M"]),subset(data,age>25))
but try explaining that to a class of MD's!
Below is a first sketch of a set of functions that extend t.test to
allow specification in terms of a list of vectors or a vector and a
grouping variable or a model formula, allowing data= and subset=
arguments as well. The paired test can also be specified using a
Pairs(x,y) construction, which should help prevent people from using
the wrong test. It's supposed to be downward compatible with the Splus
syntax.
I've been pushing Kurt to use a similar interface for all the ctest
functions, so I thought you'd all like to see them. If nothing else,
they should be interesting to study (they were definitely fun to
write), since they use some *really* dirty tricks relating to R's
evaluation model.
Anyway, here goes (remember, I said *first* sketch - there is at least
one bug in it, can you spot it?):
----
###
### The first couple of lines just guard against multiple loading
###
if(exists("t.test",envir=.GlobalEnv))rm(t.test)
if(exists("t.test.default",envir=.GlobalEnv))rm(t.test.default)
### Utility function to extract the environment the call is evaluated
### in. If you do eval(x,list), list is found as the variable "envir"
### in the stack frame immediately below the local frame
.ParentEnv<-function()
{
parent<-sys.parent()
grandparent <- sys.parent(2)
if ( parent - grandparent == 1 )
sys.frame(grandparent)
else
get("envir",envir=sys.frame(parent-1))
}
.t.test<-t.test
t.test.default<-function(x,...,group)
{
e<-.ParentEnv()
call <- if ( is.list(x) )
substitute(t.test.list(x,...))
else if (missing(group))
substitute(.t.test(x,...))
else
substitute(t.test.list(split(x, group),...))
eval(call,e)
}
t.test<-function(...,data=sys.frame(sys.parent()),subset)
{
dname.add<-
if (!missing(data))
paste(", data frame:", deparse(substitute(data)))
else
""
if (!missing(subset)){
dname.add<-paste(dname.add, ", subset:", deparse(substitute(subset)))
subset <- eval(substitute(subset),data)
data <- data[subset,] # had better be a data frame...
}
res<-eval(substitute(t.test.generic(...)), data)
res$data.name <- paste(res$data.name, dname.add)
res
}
t.test.generic<-function(x,...)
{
UseMethod("t.test")
}
t.test.list<-function(l,...)
{
if (length(l) != 2)
stop("need exactly 2 groups")
res<-.t.test(l[[1]],l[[2]],...)
res$data.name<-deparse(substitute(l))
nn<-names(l)
if (length(nn) != 2)
nn<-c("group 1", "group 2")
names(res$estimate)<-nn
res
}
t.test.formula<-function(f,...)
{
e<-.ParentEnv()
f[[1]]<-as.name("split")
call<-as.call(list(as.name("t.test.list"), f, ...))
eval(call,e)
}
Pairs<-function(x,y)
structure(match.call(),class="paired")
t.test.paired<-function(l,...)
{
e<-.ParentEnv()
call<-c(as.list(l),list(...),list(paired=T))
call[[1]]<-as.name(".t.test")
call<-as.call(call)
eval(call,e)
}
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._