[R] multiple t-tests across similar variable names

arun smartpink111 at yahoo.com
Thu Oct 11 19:06:28 CEST 2012


Hi Shantanu,

I guess the below code should solve both the issues:

set.seed(432)
dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),pre_banana=sample(25:35,5,replace=TRUE),post_apple=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
 colnames(dat2)<-gsub("^pre\\_(.*)","\\1_pre",gsub("^post\\_(.*)","\\1_post",colnames(dat2)))
dat3<-t(dat2[order(colnames(dat2))])
dat3<-data.frame(varName=gsub("(.*)\\_.*","\\1",row.names(dat3)),dat3)
list3<-lapply(split(dat3,dat3$varName),function(x) t(x[-1]))
res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
res3
#      meandifference     CIlow   CIhigh      p.value
#apple            12.6  8.519476 16.68052 0.0010166626
#banana           15.0 12.088040 17.91196 0.0001388506
#orange           18.2 13.604166 22.79583 0.0003888560
A.K.




----- Original Message -----
From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Thursday, October 11, 2012 10:22 AM
Subject: RE: [R] multiple t-tests across similar variable names

hi Arun,
This is very helpful thanks. 

I'm running into a couple issues:
1. Since some of the variables start with "pre_apple" and others "apple_post" sorting the variables doesn't completely put pre-post variables next to each other.
2. I have about 50 variables so typing this line is a bit cumbersome:

> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])

Thanks,
Shantanu

________________________________________
From: arun [smartpink111 at yahoo.com]
Sent: Thursday, October 11, 2012 9:14 AM
To: Rui Barradas
Cc: Nundy, Shantanu; R help
Subject: Re: [R] multiple t-tests across similar variable names

HI Rui,

By running your code, I got the results as:
result
#       MeanDiff   CIlower    CIupper      p.value
#apple     -12.6 -16.68052  -8.519476 0.0010166626
#banana    -15.0 -17.91196 -12.088040 0.0001388506
#orange    -18.2 -22.79583 -13.604166 0.0003888560

From my code:
res3
#       meandifference     CIlow   CIhigh      p.value
#apple            12.6  8.519476 16.68052 0.0010166626
#banana           15.0 12.088040 17.91196 0.0001388506
#orange           18.2 13.604166 22.79583 0.0003888560

There is difference in signs.
A.K.




----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu" <snundy at chicagobooth.edu>
Cc: R help <r-help at r-project.org>
Sent: Thursday, October 11, 2012 9:25 AM
Subject: Re: [R] multiple t-tests across similar variable names

Hello,

I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names.

# auxiliary functions
ifswap <- function(x)
    if(x[1] %in% c("pre", "post")) x[2:1] else x

getpair <- function(i, post)
    post[ which(vmat[post, 1] == vmat[i, 1]) ]

makeLine <- function(h)
    c(MeanDiff = unname(h$estimate),
        CIlower = h$conf.int[1],
        CIupper = h$conf.int[2],
        p.value = h$p.value)

doTests <- function(DF, Pairs){
    t.list <- lapply( seq_len(nrow(Pairs)), function(i)
        t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
    do.call(rbind, lapply(t.list, makeLine))
}

# dataset
set.seed(432)
dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE),
            orange_post = sample(18:28,5,replace=TRUE),
            pre_banana = sample(25:35,5,replace=TRUE),  # here
            apple_post = sample(20:30,5,replace=TRUE),
            post_banana = sample(40:50,5,replace=TRUE), # and here
            orange_pre = sample(5:10,5,replace=TRUE))


#--------------------------------
# start processing the data.frame
# Make pairs of pre/post columns
vars <- names(dat2)
vmat <- do.call(rbind, strsplit(vars, "_"))
vmat <- t(apply(vmat, 1, ifswap))
pre <- which(vmat[, 2] == "pre")
post <- which(vmat[, 2] == "post")
post <- sapply(pre, getpair, post)
pairs <- matrix(c(pre, post), ncol = 2)

# now the tests
result <- doTests(dat2, pairs)
rownames(result) <- vmat[pre, 1]
result


In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got.
Anyway, I'll see both codes again, to try to see what's going on.

Hope this helps,

Rui Barradas

Em 11-10-2012 05:31, arun escreveu:
> HI,
>
> If you have a lot of variables and in no order, then it would be better to order the data by column names.
> For e.g.
> set.seed(432)
> dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
> dat3<-dat2[order(colnames(dat2))] #order the columns
> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
> res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
> row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
> res3
> #     meandifference     CIlow   CIhigh      p.value
> #apple            12.6  8.519476 16.68052 0.0010166626
> #banana           15.0 12.088040 17.91196 0.0001388506
> #orange           18.2 13.604166 22.79583 0.0003888560
>
> A.K.
>
>
>
> ----- Original Message -----
> From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
> To: "r-help at r-project.org" <r-help at r-project.org>
> Cc:
> Sent: Wednesday, October 10, 2012 7:09 PM
> Subject: Re: [R] multiple t-tests across similar variable names
>
> Hi everyone-
>
> I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order.
>
> apple_pre orange_pre orange_post pre_banana apple_post post_banana
> person_1
> person_2
> person_3
> ...
> person_x
>
>
> How do I:
> 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
> 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value.
>
> Thank you kindly,
> -Shantanu
>
> Shantanu Nundy, M.D.
> University of Chicago
>
>      [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list