[R] resampling for correlation and testing

ilai keren at math.montana.edu
Thu Mar 29 02:41:03 CEST 2012


On Wed, Mar 28, 2012 at 3:53 PM, Benton, Paul
<hpaul.benton08 at imperial.ac.uk> wrote:
> Hello all R-er,
>
> I'm trying to run a resampling method on some data. The current method I have takes 2+ days or a lot of memory . I was wondering if anyone has a better suggestion.
>
> Currently I take a matrix and get the correlation matrix from it. This will be called rho.A. Each element in this will be tested against the distribution from the resampled correlation B matrix.
>
> Some example code:
>
> A<-matrix(rnorm(100), ncol=10)
> B<-matrix(rnorm(100), ncol=10)
>
> rho.A<-cor(A)
>
> {
> idx<-sample(1:10, 10)
> idx
> # [1] 8 4 5 7 1 9 2 10 6  3
>
> rho.B<-cor(B[,idx])
> } ## repeat this x time (currently 500)
>
> ## in essence we then have the following :
> rho.arrayB<-array(runif((10*10)*500), dim=c(10,10,500))

Err... no we don't. sample(10,10) ; sample(10,10) ... only permutes
the columns, so the 500 cor(B) have exactly the same values in
different off diag positions. Using runif they are unique
>
> ## Then test if rho.A[1,1] come from the distribution of rho.B[1,1]
> pvalueMat[1,1]<-wilcox.test(rho.array[1,1,] , rho.A[1,1])$p.value
>

>From what I know cor(A)[ i , i ] = cor(B)[ j , j ] = 1   for any
choice of A,B,i and j
I don't think Wilcox intended his test to be used in this way....

I would start with fixing these issues first so you don't wait 2 days
for a vector of NaN's

Cheers


> However, my array size would be 2300 x 2300 x 500 which R won't let me even make as an empty structure. Any suggestion are more than welcomed !!
>
> Cheers,
>
> Paul
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list