[R] multiple t-tests across similar variable names
arun
smartpink111 at yahoo.com
Thu Oct 11 20:55:36 CEST 2012
HI Shantanu,
I saw your reply to Rui regarding multiple underscores in Nabble:
(Actually, I see now that part of the problem is that many of the
names have multiple underscores such as "red_apple_pre" or
"post_banana_organic". I think this is causing a problem for this line
in your code:)
I wasn't aware of that problem. In that case, try this:
set.seed(432)
dat2<-data.frame(red_apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana_organic=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
nam1<-c("apple","orange","banana")
nam2<-c("pre","post")
colnames(dat2)<-unlist(lapply(lapply(strsplit(colnames(dat2),"_"),function(x) x[x%in%nam1|x%in%nam2]),function(x) paste(x[1],x[2],sep="_")))
colnames(dat2)<-gsub("^pre\\_(.*)","\\1_pre",gsub("^post\\_(.*)","\\1_post",colnames(dat2)))
dat3<-t(dat2[order(colnames(dat2))])
dat3<-data.frame(varName=gsub("(.*)\\_.*","\\1",row.names(dat3)),dat3)
list3<-lapply(split(dat3,dat3$varName),function(x) t(x[-1]))
res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
res3
# meandifference CIlow CIhigh p.value
#apple 12.6 8.519476 16.68052 0.0010166626
#banana 15.0 12.088040 17.91196 0.0001388506
#orange 18.2 13.604166 22.79583 0.0003888560
I hope this works.
A.K.
----- Original Message -----
From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
To: arun <smartpink111 at yahoo.com>
Cc:
Sent: Thursday, October 11, 2012 10:22 AM
Subject: RE: [R] multiple t-tests across similar variable names
hi Arun,
This is very helpful thanks.
I'm running into a couple issues:
1. Since some of the variables start with "pre_apple" and others "apple_post" sorting the variables doesn't completely put pre-post variables next to each other.
2. I have about 50 variables so typing this line is a bit cumbersome:
> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
Thanks,
Shantanu
________________________________________
From: arun [smartpink111 at yahoo.com]
Sent: Thursday, October 11, 2012 9:14 AM
To: Rui Barradas
Cc: Nundy, Shantanu; R help
Subject: Re: [R] multiple t-tests across similar variable names
HI Rui,
By running your code, I got the results as:
result
# MeanDiff CIlower CIupper p.value
#apple -12.6 -16.68052 -8.519476 0.0010166626
#banana -15.0 -17.91196 -12.088040 0.0001388506
#orange -18.2 -22.79583 -13.604166 0.0003888560
From my code:
res3
# meandifference CIlow CIhigh p.value
#apple 12.6 8.519476 16.68052 0.0010166626
#banana 15.0 12.088040 17.91196 0.0001388506
#orange 18.2 13.604166 22.79583 0.0003888560
There is difference in signs.
A.K.
----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu" <snundy at chicagobooth.edu>
Cc: R help <r-help at r-project.org>
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names
Hello,
I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names.
# auxiliary functions
ifswap <- function(x)
if(x[1] %in% c("pre", "post")) x[2:1] else x
getpair <- function(i, post)
post[ which(vmat[post, 1] == vmat[i, 1]) ]
makeLine <- function(h)
c(MeanDiff = unname(h$estimate),
CIlower = h$conf.int[1],
CIupper = h$conf.int[2],
p.value = h$p.value)
doTests <- function(DF, Pairs){
t.list <- lapply( seq_len(nrow(Pairs)), function(i)
t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
do.call(rbind, lapply(t.list, makeLine))
}
# dataset
set.seed(432)
dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE),
orange_post = sample(18:28,5,replace=TRUE),
pre_banana = sample(25:35,5,replace=TRUE), # here
apple_post = sample(20:30,5,replace=TRUE),
post_banana = sample(40:50,5,replace=TRUE), # and here
orange_pre = sample(5:10,5,replace=TRUE))
#--------------------------------
# start processing the data.frame
# Make pairs of pre/post columns
vars <- names(dat2)
vmat <- do.call(rbind, strsplit(vars, "_"))
vmat <- t(apply(vmat, 1, ifswap))
pre <- which(vmat[, 2] == "pre")
post <- which(vmat[, 2] == "post")
post <- sapply(pre, getpair, post)
pairs <- matrix(c(pre, post), ncol = 2)
# now the tests
result <- doTests(dat2, pairs)
rownames(result) <- vmat[pre, 1]
result
In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.
Hope this helps,
Rui Barradas
Em 11-10-2012 05:31, arun escreveu:
> HI,
>
> If you have a lot of variables and in no order, then it would be better to order the data by column names.
> For e.g.
> set.seed(432)
> dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
> dat3<-dat2[order(colnames(dat2))] #order the columns
> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
> res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
> row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
> res3
> # meandifference CIlow CIhigh p.value
> #apple 12.6 8.519476 16.68052 0.0010166626
> #banana 15.0 12.088040 17.91196 0.0001388506
> #orange 18.2 13.604166 22.79583 0.0003888560
>
> A.K.
>
>
>
> ----- Original Message -----
> From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
> To: "r-help at r-project.org" <r-help at r-project.org>
> Cc:
> Sent: Wednesday, October 10, 2012 7:09 PM
> Subject: Re: [R] multiple t-tests across similar variable names
>
> Hi everyone-
>
> I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order.
>
> apple_pre orange_pre orange_post pre_banana apple_post post_banana
> person_1
> person_2
> person_3
> ...
> person_x
>
>
> How do I:
> 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
> 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value.
>
> Thank you kindly,
> -Shantanu
>
> Shantanu Nundy, M.D.
> University of Chicago
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list