[R] multiple t-tests across similar variable names
Rui Barradas
ruipbarradas at sapo.pt
Thu Oct 11 16:51:32 CEST 2012
Hello,
If that is the problem now, then change the variables' names.
In what follows, the first line is just the example you gave. In the
actual runnunig code uncomment the commented out lines.
vars <- c("red_apple_pre", "post_banana_organic")
#vars <- names(dat)
vars <- gsub("_pre", "=pre", vars)
vars <- gsub("_post", "=post", vars)
vars <- gsub("pre_", "pre=", vars)
vars <- gsub("post_", "post=", vars)
vars <- gsub("_", "\\.", vars)
vars <- sub("=", "_", vars)
#names(dat) <- vars
Rui Barradas
Em 11-10-2012 15:17, Nundy, Shantanu escreveu:
> Actually, I see now that part of the problem is that many of the names have multiple underscores such as "red_apple_pre" or "post_banana_organic". I think this is causing a problem for this line in your code:
>> vmat <- do.call(rbind, strsplit(vars, "_"))
> Shantanu
>
>
>
> ________________________________________
> From: Nundy, Shantanu
> Sent: Thursday, October 11, 2012 9:07 AM
> To: Rui Barradas
> Subject: RE: [R] multiple t-tests across similar variable names
>
> Rui,
> Thank you so much for your solution. It is exactly what I was struggling with!
>
> One small question. When I ran the code on my actual dataset I got the error below:
>
>> vars <- names(master)
>> vmat <- do.call(rbind, strsplit(vars, "_"))
> Warning message:
> In function (..., deparse.level = 1) :
> number of columns of result is not a multiple of vector length (arg 1)
>
> My guess is that the problem is not all the variables have "pre" or "post" in them. Some of the variables are constants that I will not do a paired t-test on. What would be the easiest way to get around this, perhaps even by simply removing all of the variables that have neither "pre" or "post" in them?
>
> Thanks again,
> Shantanu
>
>
>
>
>
>
>
> ________________________________________
> From: arun [smartpink111 at yahoo.com]
> Sent: Thursday, October 11, 2012 8:50 AM
> To: Rui Barradas
> Cc: Nundy, Shantanu
> Subject: Re: [R] multiple t-tests across similar variable names
>
> HI Rui,
>
> Thanks for testing the code. I will look into it later.
> A.K.
>
>
>
>
> ----- Original Message -----
> From: Rui Barradas <ruipbarradas at sapo.pt>
> To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu" <snundy at chicagobooth.edu>
> Cc: R help <r-help at r-project.org>
> Sent: Thursday, October 11, 2012 9:25 AM
> Subject: Re: [R] multiple t-tests across similar variable names
>
> Hello,
>
> I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names.
>
> # auxiliary functions
> ifswap <- function(x)
> if(x[1] %in% c("pre", "post")) x[2:1] else x
>
> getpair <- function(i, post)
> post[ which(vmat[post, 1] == vmat[i, 1]) ]
>
> makeLine <- function(h)
> c(MeanDiff = unname(h$estimate),
> CIlower = h$conf.int[1],
> CIupper = h$conf.int[2],
> p.value = h$p.value)
>
> doTests <- function(DF, Pairs){
> t.list <- lapply( seq_len(nrow(Pairs)), function(i)
> t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
> do.call(rbind, lapply(t.list, makeLine))
> }
>
> # dataset
> set.seed(432)
> dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE),
> orange_post = sample(18:28,5,replace=TRUE),
> pre_banana = sample(25:35,5,replace=TRUE), # here
> apple_post = sample(20:30,5,replace=TRUE),
> post_banana = sample(40:50,5,replace=TRUE), # and here
> orange_pre = sample(5:10,5,replace=TRUE))
>
>
> #--------------------------------
> # start processing the data.frame
> # Make pairs of pre/post columns
> vars <- names(dat2)
> vmat <- do.call(rbind, strsplit(vars, "_"))
> vmat <- t(apply(vmat, 1, ifswap))
> pre <- which(vmat[, 2] == "pre")
> post <- which(vmat[, 2] == "post")
> post <- sapply(pre, getpair, post)
> pairs <- matrix(c(pre, post), ncol = 2)
>
> # now the tests
> result <- doTests(dat2, pairs)
> rownames(result) <- vmat[pre, 1]
> result
>
>
> In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got.
> Anyway, I'll see both codes again, to try to see what's going on.
>
> Hope this helps,
>
> Rui Barradas
>
> Em 11-10-2012 05:31, arun escreveu:
>> HI,
>>
>> If you have a lot of variables and in no order, then it would be better to order the data by column names.
>> For e.g.
>> set.seed(432)
>> dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
>> dat3<-dat2[order(colnames(dat2))] #order the columns
>> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
>> res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
>> row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
>> res3
>> # meandifference CIlow CIhigh p.value
>> #apple 12.6 8.519476 16.68052 0.0010166626
>> #banana 15.0 12.088040 17.91196 0.0001388506
>> #orange 18.2 13.604166 22.79583 0.0003888560
>>
>> A.K.
>>
>>
>>
>> ----- Original Message -----
>> From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
>> To: "r-help at r-project.org" <r-help at r-project.org>
>> Cc:
>> Sent: Wednesday, October 10, 2012 7:09 PM
>> Subject: Re: [R] multiple t-tests across similar variable names
>>
>> Hi everyone-
>>
>> I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order.
>>
>> apple_pre orange_pre orange_post pre_banana apple_post post_banana
>> person_1
>> person_2
>> person_3
>> ...
>> person_x
>>
>>
>> How do I:
>> 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
>> 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value.
>>
>> Thank you kindly,
>> -Shantanu
>>
>> Shantanu Nundy, M.D.
>> University of Chicago
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list