[R] resampling for correlation and testing
Benton, Paul
hpaul.benton08 at imperial.ac.uk
Thu Mar 29 04:03:08 CEST 2012
On Mar 29, 2012, at 1:41 AM, ilai wrote:
> On Wed, Mar 28, 2012 at 3:53 PM, Benton, Paul
> <hpaul.benton08 at imperial.ac.uk> wrote:
>> Hello all R-er,
>>
>> I'm trying to run a resampling method on some data. The current method I have takes 2+ days or a lot of memory . I was wondering if anyone has a better suggestion.
>>
>> Currently I take a matrix and get the correlation matrix from it. This will be called rho.A. Each element in this will be tested against the distribution from the resampled correlation B matrix.
>>
>> Some example code:
>>
>> A<-matrix(rnorm(100), ncol=10)
>> B<-matrix(rnorm(100), ncol=10)
>>
>> rho.A<-cor(A)
>>
>> {
>> idx<-sample(1:10, 10)
>> idx
>> # [1] 8 4 5 7 1 9 2 10 6 3
>>
>> rho.B<-cor(B[,idx])
>> } ## repeat this x time (currently 500)
>>
>> ## in essence we then have the following :
>> rho.arrayB<-array(runif((10*10)*500), dim=c(10,10,500))
>
> Err... no we don't. sample(10,10) ; sample(10,10) ... only permutes
> the columns, so the 500 cor(B) have exactly the same values in
> different off diag positions. Using runif they are unique
My apologies for trying to make some example data. I should have done exactly what I was doing, which would have been harder to read. Since runif doesn't give exactly the same thing I'll give my functions that I'm using.
## note rperm was originally submitted to the stackexchange.com
## http://stats.stackexchange.com/questions/24300/how-to-resample-in-r-without-repeating-permutations
rperm <- function(m, size=2, replaceTF=TRUE) { # Obtain m unique permutations of 1:size
# Function to obtain a new permutation.
newperm <- function() {
count <- 0 # Protects against infinite loops
repeat {
# Generate a permutation and check against previous ones.
p <- sample(1:size, replace=replaceTF)
hash.p <- paste(p, collapse="") # make a name for the list
if (is.null(cache[[hash.p]])) break
## stop if we have not already made this seq - repeat to make another new seq
# Prepare to try again.
count <- count+1
if (count > 1000) { # 1000 is arbitrary; adjust to taste
p <- NA # NA indicates a new permutation wasn't found
hash.p <- ""
break
}
}
cache[[hash.p]] <<- TRUE # Update the list of permutations found
p # Return this (new) permutation
}
# Obtain m unique permutations.
cache <- list()
replicate(m, newperm())
}
library(parallel)
cl <- makeCluster(detectCores())
bootComb<-t(rperm(times, ncol(mat.B)))
pValues.mat<-matrix(NA, nrow=nrow(mat.B), ncol=nrow(mat.B))
for(i in 1:nrow(mat.B)){
cat(round(i/nrow(mat.B),2)*100, "% \r")
rho<-parApply(cl, bootComb, 1, function(idx, xmat, ymat){
rho<-apply(ymat[,idx], 1, function(ymat, xmat){
## parallel here due to memory sizes
rho<-cor(xmat, ymat)
}, xmat[idx]) ##gives back a vector of rho for n vs m
}, mat.B[i, ], mat.B) ## returns a matrix of rho's for each combination
rho.all<- cbind(rho, rho.A[i,])
pValues.mat[i, ] <- parApply(cl, rho.all, 1, function(rhoVec){
tl <- length(rhoVec) ## test index
bl <- tl-1 ## dist index
wilcox.test(rhoVec[1:bl], rhoVec[tl])$p.value
})
}
stopCluster(cl)
>>
>> ## Then test if rho.A[1,1] come from the distribution of rho.B[1,1]
>> pvalueMat[1,1]<-wilcox.test(rho.array[1,1,] , rho.A[1,1])$p.value
>>
>
> From what I know cor(A)[ i , i ] = cor(B)[ j , j ] = 1 for any
> choice of A,B,i and j
No, cor(a)[i, j] != cor(b)[i , j]
If your concern is because they are coming from the same distribution then again this is example data. Even then I would imagine that the correlation would be different for small n. Either way resampling the columns will give different correlation values. Please try the code yourself.
> I don't think Wilcox intended his test to be used in this way….
Probably true …. care to suggest a different statistical test ?
>
> I would start with fixing these issues first so you don't wait 2 days
> for a vector of NaN's
Actually I didn't get a vector or NaN's I got a matrix of pvalues. Which has proved useful. Now I want to make the function faster and was looking for a bit of help.
>
> Cheers
>
>
>> However, my array size would be 2300 x 2300 x 500 which R won't let me even make as an empty structure. Any suggestion are more than welcomed !!
>>
>> Cheers,
>>
>> Paul
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list