[R] multiple paired t-tests without loops
Greg Snow
Greg.Snow at imail.org
Wed Apr 28 19:04:33 CEST 2010
Have you considered limiting yourself to the unique combinations rather than every possible permutation? E.g. the permutations 1,2 | 3,4 and 2,1 | 4,3 give redundant results. The combn function (with the FUN) argument may be of help.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: Matthew Finkbeiner [mailto:matthew.finkbeiner at gmail.com] On
> Behalf Of Matthew Finkbeiner
> Sent: Monday, April 26, 2010 4:09 PM
> To: Greg Snow
> Cc: matthew.finkbeiner at mq.edu.au; r-help at r-project.org
> Subject: Re: [R] multiple paired t-tests without loops
>
> Yes, I suspect that I will end up using a sampling approach, but I'd
> like to use an exact test if it's at all feasible.
>
> Here are two samples of data from 3 subjects:
> Sample Subj C1 C2
> 44 1 0.0093 0.0077
> 44 2 0.0089 0.0069
> 44 3 0.051 0.0432
> 44 4 0.014 0.0147
> 44 5 0.0161 0.0117
> 45 1 0.0103 0.0086
> 45 2 0.0099 0.0078
> 45 3 0.0542 0.0458
> 45 4 0.0154 0.0163
> 45 5 0.0175 0.0129
>
>
> and then here is the script I've pieced together from things I've found
> on the web (sorry for not citing the snippets!). any pointers on how
> to
> speed it up would be greatly appreciated.
>
> #----------------------------------
> # Utility function
> # that returns binary representation of 1:(2^n) X SubjN
> binary.v <-
> function(n)
> {
> x <- 1:(2^n)
> mx <- max(x)
> digits <- floor(log2(mx))
> ans <- 0:(digits-1); lx <- length(x)
> x <- matrix(rep(x,rep(digits, lx)),ncol=lx)
> x <- (x %/% 2^ans) %% 2
> }
>
> library(plyr)
>
>
> #first some global variables
> TotalSubjects <- 5
> TotalSamples <- 2
> StartSample <- 44
> EndSample <- ((StartSample + TotalSamples)-1)
> maxTs <- NULL
> obsTs <- NULL
>
>
>
>
> #create index array that drives the permuations for all samples
> ind <- binary.v(TotalSubjects)
>
> #transpose ind so that the first 2^N items correspond to S1,
> #the second 2^N correspond to S2 and so on...
> transind <- t(ind)
>
> #get data file that is organized first by sample then by subj (e.g.
> sample1 subject1
> # sample1 subject 2 ... sample 1 subject N)
> #sampledatafile <- file.choose()
>
> samples <- read.table(sampledatafile, header=T)
>
> #this is the progress bar
> pb <- txtProgressBar(min = StartSample, max = EndSample, style = 3)
> setTxtProgressBar(pb, 1)
>
> start.t <- proc.time()
>
> #begin loop that analyzes data sample by sample
> for (s in StartSample:EndSample) {
>
> S <- samples[samples$Sample==s,] #pick up data for current sample
>
> #reproduce data frame rows once for each permutation to be done
> expanddata <- S[rep(1:nrow(S), each = 2^TotalSubjects),]
>
>
> #create new array to hold the flipped (permuted) data
> permdata = expanddata
>
> #permute the data
> permdata[transind==1,3] <- expanddata[transind==1,4] #Cnd1 <- Cnd2
> permdata[transind==1,4] <- expanddata[transind==1,3] #Cnd2 <- Cnd1
>
> #create permutation # as a factor in dataframe
> PermN <- rep(rep(1:2^TotalSubjects, TotalSubjects),2)
>
> #create Sample# as a factor
> Sample <- rep(permdata[,1],2) #Sample# is in the 1st Column
>
> #create subject IDs as a factor
> Subj <- rep(permdata[,2],2) #Subject ID is in the 2nd Column
>
> #stack the permutated data
> StackedPermData <- stack(permdata[,3:4])
>
> #bind all the factors together
> StackedPermData <- as.data.frame(cbind(Sample, Subj, PermN,
> StackedPermData))
>
>
> #sort by perm
> sortedstack <-
> as.data.frame(StackedPermData[order(StackedPermData$PermN,
> StackedPermData$Sample),])
>
>
> #clear up some memory
> rm(expanddata, permdata, StackedPermData)
>
> #pull out data 1 perm at a time
> res<-ddply(sortedstack, c("Sample", "PermN"), function(.data){
>
> # Type combinations by Class
> combs<-t(combn(sort(unique(.data[,5])),2))
>
> # Applying the t-test for them
> aaply(combs,1, function(.r){
> x1<-.data[.data[,5]==.r[1],4] # select first column
> x2<-.data[.data[,5]==.r[2],4] # select first column
>
> tvalue <- t.test(x1,x2, paired = T)
>
> res <- c(tvalue$statistic,tvalue$parameter,tvalue$p.value)
> names(res) <- c('stat','df','pvalue')
> res
> }
> )
> }
> )
>
> # update progress bar
> setTxtProgressBar(pb, s)
>
> #get max T vals
> maxTs <- c(maxTs, tapply (res$stat, list (res$Sample), max))
>
> #get observed T vals
> obsTs <- c(obsTs, res$stat[length(res$stat)])
>
> #here we need to save res to a binary file
>
>
> }
> #close out the progress bar
> close(pb)
>
> end.t <- proc.time() - start.t
> print(end.t)
>
> #get cutoffs
> #these are the 2-tailed t-vals that maintain experimentwise error at
> the
> 0.05 level
> lowerT <- quantile(maxTs, .025)
> upperT <- quantile(maxTs, .975)
>
>
>
>
>
>
>
>
>
>
>
> On 4/27/2010 6:53 AM, Greg Snow wrote:
> > The usual way to speed up permutation testing is to sample from the
> > set of possible permutations rather than looking at all possible
> > ones.
> >
> > If you show some code then we may be able to find some inefficiencies
> > for you, but there is not general solution, poorly written uses of
> > apply will be slower than well written for loops. In some cases
> > rewriting critical pieces in C or fortran will help quite a bit, but
> > we need to see what you are already doing to know if that will help
> > or not.
> >
More information about the R-help
mailing list