[R] Pearson correlation with randomization
David Winsemius
dwinsemius at comcast.net
Fri Jan 21 22:03:31 CET 2011
On Jan 21, 2011, at 3:29 PM, Brahmachary, Manisha wrote:
> Hi David,
>
> Thanks a lot for you inputs. I have modified my code accordingly.
> There
> is one more place that I need some help.
> This is my code:
>
> =
> =
> ======================================================================
> ======
>
> X<- read.table("X.txt",as.is=T,header=T,row.names=1)
> Y<- read.table("Y.txt",as.is=T,header=T,row.names=1)
>
> X.mat<- as.matrix(X)
> Y.mat<- as.matrix(Y)
>
> # calculating the true correlation values from my original dataset
> True.Corrs<- matrix()
> for (k in 1:nrow(SNP.mat)){
> True.Corrs[k]<- cor.test(X.mat[k,],Y.mat[k,],alternative
> =c("greater"),method= c("pearson"))$p.value
> }
>
> # Creating the random distribution of Correlation p-values
> X.rand <- list()
> Y.rand<- list()
>
> X.rand<-replicate(1000,(X[sample(1:ncol(Y))]),simplify=FALSE) #
> Randomizing the column values for each row
> Y.rand<-replicate(1000,Y,simplify=FALSE) # Creating an equivalent list
> of the Y matrix (non-randomised), to be able to do a pair-wise
> cor.test
>
> Corrs.rand<- list()
> tmp<- list()
> for (i in 1:2){
> for (j in 1:3){
> # How to store a multiple values per element of list?
> tmp[[j]] <- cor.test(t(X.rand[[i]][j,]),t(Y.rand[[i]][j,]),alternative
> =c("greater"),method= c("pearson"))$p.value
> Corrs.rand[[i]] <- rbind(Corrs.rand[[j]],tmp[[j]])
> }
> }
>
> =
> =
> ======================================================================
>
> At this step:
>
> for (i in 1:length(X.rand)){
> for (j in 1:nrow(X.rand[[1]]){
> # How to store a multiple values per element of list?
> tmp[[j]] <- cor.test(t(X.rand[[i]][j,]),t(Y.rand[[i]][j,]),alternative
> =c("greater"),method= c("pearson"))$p.value
> Corrs.rand[[i]] <- rbind(Corrs.rand[[j]],tmp[[j]])
> }
> }
>
> I am not sure how I can store multiple values per element.
The usual way would be to pre-allocate a matrix or a data.frame and
then use "[<-" to assign either a whole row at a time or assign
individual elements one by one. rbind in a loop is definitely going to
slow you down.
I haven't followed through the individual steps of all the for-loops.
I guess I have lost the ability to think that way after learning to
use the matrix and indexing features of R, alas. If Corrs.rand is a
1000 x 12 data.frame (which is just a special square list) then you
can assign the i-th row with:
Corrs.rand[i, ] <- <some-12-element-object>
Or you can assign the i,j-th element with:
Corrs.rand[i,j ] <- <vector-of-length-1>
The same syntax works for matrices.
--
David.
> For eg. I
> want a list of length 1000 (which is the number of random
> permutations I
> have generated for my dataset) and in each element of the list I
> need to
> store 12 p.values where 12 corresponds to the number of rows I have in
> my randomized dataset. Eg.
>
> [[1]]
> 0.23
> 0.05
> 0.78
> 0.78
> 0.87
> 0.11
> 0.003
> 0.9
> 0.76
> 0.11
> 0.23
> 0.56
> [[2]]
> 0.08
> 0.67
> 0.45
> 0.23
> 0.35
> 0.85
> 0.99
> 0.78
> 0.66
> 0.45
> 0.06
> 0.1
> [[3]]
> So on...
>
> I maybe going about this in a complicated way and there may be other
> ways of storing the p.values for each of my randomized dataset. So if
> anybody has ideas please oblige me.
> ======================================================
> X dataset:(sample)
> #Probes X10851 X12144 X12155 X11882 X10860 X12762 X12239 X12154
> 1 1 1 0 0 1 0 2 0
> 2 0 0 0 0 0 0 0 0
> 3 2 2 2 2 1 2 1 2
> 4 0 0 0 0 0 0 0 0
> 5 2 2 2 2 2 2 2 2
> 6 0 1 0 0 1 1 1 1
> 7 2 2 NaN 2 2 2 2 2
> 8 2 2 2 2 2 2 2 2
> 9 0 1 0 1 1 NaN 1 2
> 10 2 2 2 2 2 2 2 2
> 11 2 0 0 0 0 0 0 0
> 12 0 1 0 1 1 0 1 1
>
>
> Y dataset:(sample)
>
> Probes X10851 X12144 X12155 X11882 X10860 X12762 X12239 X12154
> 1 793.0830793 788.1813828 867.8504057 729.8321265
> 816.8519963 805.2113707 774.5990003 854.6384306
> 2 12.8695023 4.312894024 10.69769375 5.872212512
> 13.79299806 9.394132659 6.297552848 9.307943304
> 3 699.7791876 826.997429 795.6409729 770.9376141
> 806.1241089 782.3970486 817.107482 859.7154906
> 4 892.8217221 869.0481458 806.3386667 812.0431017
> 873.5565439 794.4752191 813.9587056 814.8681274
> 5 892.8217221 869.0481458 806.3386667 812.0431017
> 873.5565439 794.4752191 813.9587056 814.8681274
> 6 839.7350251 943.4455677 950.7575323 859.0208018
> 894.246041 853.524053 941.4841508 913.0246205
> 7 653.1272418 751.5217836 750.1757745 737.382114
> 757.8486157 758.2407075 724.2185775 770.8669409
> 8 12.8695023 4.312894024 10.69769375 5.872212512
> 13.79299806 9.394132659 6.297552848 9.307943304
> 9 839.7350251 943.4455677 950.7575323 859.0208018
> 894.246041 853.524053 941.4841508 913.0246205
> 10 653.1272418 751.5217836 750.1757745 737.382114
> 757.8486157 758.2407075 724.2185775 770.8669409
> 11 653.1272418 751.5217836 750.1757745 737.382114
> 757.8486157 758.2407075 724.2185775 770.8669409
> 12 839.7350251 943.4455677 950.7575323 859.0208018
> 894.246041 853.524053 941.4841508 913.0246205
>
> Thanks again
>
> Manisha
>
>
>
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Tuesday, January 18, 2011 11:56 PM
> To: Brahmachary, Manisha
> Cc: R-help at r-project.org
> Subject: Re: [R] Pearson correlation with randomization
>
>
> On Jan 18, 2011, at 11:23 PM, Brahmachary, Manisha wrote:
>
>> Hello,
>>
>>
>>
>> I will be very obliged if someone can help me with this statistical R
>> problem:
>>
>> I am trying to do a Pearson correlation on my datasets X, Y with
>> randomization test. My X and Y datasets are pairs.
>>
>> 1. I want to randomize (rearrange) only my X dataset per
>> row ,while
>> keeping the my Y dataset as it is.
>
> X <- X[sample(1:nrow(Y)), ]
>
>>
>> 2. Then Calculate the correlation for this pair, and compare it
>> to
>> your true value of correlation.
>>
>> 3. Repeat 2 and 3 maybe a 100 times
>
> You may want to look at the replicate function.
>
>>
>> 4. If your true p-value is greater than 95% of the random
>> values,
>> then you can reject the null hypothesis at p<0.05.
>
> You won't have a very stable estimate of the 95th order statistics
> with "maybe" 100 replications.
>
> --
> David.
>>
>>
>>
>> I am stuck at the randomization step. I need some help in
>> implementing
>> it the appropriate randomization step in my correlation.
>>
>> Below is my incomplete code. I will be very obliged if someone could
>> help:
>>
>>
>>
>> X <- read.table("X.txt",as.is=T,header=T,row.names=1)
>>
>> Y <- read.table("Y.txt",as.is=T,header=T,row.names=1)
>>
>>
>>
>> X.mat<- as.matrix(X)
>>
>> Y.mat<- as.matrix(Y)
>>
>>
>>
>> Corrs<- cor.test(X.mat[1,],Y.mat[1,],alternative
>> =c("greater"),method=
>> c("pearson"))
>>
>>
>>
>> Corrs.rand <- list()
>>
>>
>>
>> for (i in 1:length(X.mat)){
>>
>> for (j in 1:100){
>>
>>
>>
>> # This doesnot seem to wrok correctly. How do I run sample function
>> 100
>> times for the same row?
>>
>>
>>
>> SNP.rand<- sample(SNP.mat[i,],56, replace = FALSE, prob = NULL)
>>
>> Corrs.rand[[j]]<- cor.test(SNP.rand,CNV.mat[j,],alternative
>> =c("greater"),method= c("pearson"))
>>
>>
>>
>> # need to calculate how many times my pvalue from true p-value>
>> random
>> pvalue
>>
>> }
>>
>> }
>>
>>
>>
>> X dataset:
>>
>>
>>
>> #Probes
>>
>> X10851
>>
>> X12144
>>
>> X12155
>>
>> X11882
>>
>> X10860
>>
>> X12762
>>
>> X12239
>>
>> X12154
>>
>> 1
>>
>> 1
>>
>> 1
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 0
>>
>> 2
>>
>> 0
>>
>> 2
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 3
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 1
>>
>> 2
>>
>> 1
>>
>> 2
>>
>> 4
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 5
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 6
>>
>> 0
>>
>> 1
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 1
>>
>> 1
>>
>> 1
>>
>> 7
>>
>> 2
>>
>> 2
>>
>> NaN
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 8
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 9
>>
>> 0
>>
>> 1
>>
>> 0
>>
>> 1
>>
>> 1
>>
>> NaN
>>
>> 1
>>
>> 2
>>
>> 10
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 11
>>
>> 2
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 12
>>
>> 0
>>
>> 1
>>
>> 0
>>
>> 1
>>
>> 1
>>
>> 0
>>
>> 1
>>
>> 1
>>
>>
>>
>> Y dataset:
>>
>> Probes
>>
>> X10851
>>
>> X12144
>>
>> X12155
>>
>> X11882
>>
>> X10860
>>
>> X12762
>>
>> X12239
>>
>> X12154
>>
>> 1
>>
>> 793.0831
>>
>> 788.1814
>>
>> 867.8504
>>
>> 729.8321
>>
>> 816.852
>>
>> 805.2114
>>
>> 774.599
>>
>> 854.6384
>>
>> 2
>>
>> 12.8695
>>
>> 4.312894
>>
>> 10.69769
>>
>> 5.872213
>>
>> 13.793
>>
>> 9.394133
>>
>> 6.297553
>>
>> 9.307943
>>
>> 3
>>
>> 699.7792
>>
>> 826.9974
>>
>> 795.641
>>
>> 770.9376
>>
>> 806.1241
>>
>> 782.397
>>
>> 817.1075
>>
>> 859.7155
>>
>> 4
>>
>> 892.8217
>>
>> 869.0481
>>
>> 806.3387
>>
>> 812.0431
>>
>> 873.5565
>>
>> 794.4752
>>
>> 813.9587
>>
>> 814.8681
>>
>> 5
>>
>> 892.8217
>>
>> 869.0481
>>
>> 806.3387
>>
>> 812.0431
>>
>> 873.5565
>>
>> 794.4752
>>
>> 813.9587
>>
>> 814.8681
>>
>> 6
>>
>> 839.735
>>
>> 943.4456
>>
>> 950.7575
>>
>> 859.0208
>>
>> 894.246
>>
>> 853.5241
>>
>> 941.4842
>>
>> 913.0246
>>
>> 7
>>
>> 653.1272
>>
>> 751.5218
>>
>> 750.1758
>>
>> 737.3821
>>
>> 757.8486
>>
>> 758.2407
>>
>> 724.2186
>>
>> 770.8669
>>
>> 8
>>
>> 12.8695
>>
>> 4.312894
>>
>> 10.69769
>>
>> 5.872213
>>
>> 13.793
>>
>> 9.394133
>>
>> 6.297553
>>
>> 9.307943
>>
>> 9
>>
>> 839.735
>>
>> 943.4456
>>
>> 950.7575
>>
>> 859.0208
>>
>> 894.246
>>
>> 853.5241
>>
>> 941.4842
>>
>> 913.0246
>>
>> 10
>>
>> 653.1272
>>
>> 751.5218
>>
>> 750.1758
>>
>> 737.3821
>>
>> 757.8486
>>
>> 758.2407
>>
>> 724.2186
>>
>> 770.8669
>>
>> 11
>>
>> 653.1272
>>
>> 751.5218
>>
>> 750.1758
>>
>> 737.3821
>>
>> 757.8486
>>
>> 758.2407
>>
>> 724.2186
>>
>> 770.8669
>>
>> 12
>>
>> 839.735
>>
>> 943.4456
>>
>> 950.7575
>>
>> 859.0208
>>
>> 894.246
>>
>> 853.5241
>>
>> 941.4842
>>
>> 913.0246
>>
>>
>>
>>
>>
>>
>>
>> Thanks in advance
>>
>>
>>
>> Manisha
>>
>>
>>
>> Mount Sinai School of Medicine
>>
>> Icahn Medical Institute,
>>
>> 1425 Madison Avenue, Box 1498
>>
>> NY-10029, NEW-YORK, USA
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list