[R] Pearson correlation with randomization

David Winsemius dwinsemius at comcast.net
Fri Jan 21 22:03:31 CET 2011


On Jan 21, 2011, at 3:29 PM, Brahmachary, Manisha wrote:

> Hi David,
>
> Thanks a lot for you inputs. I have modified my code accordingly.  
> There
> is one more place that I need some help.
> This is my code:
>
> = 
> = 
> ======================================================================
> ======
>
> X<- read.table("X.txt",as.is=T,header=T,row.names=1)
> Y<- read.table("Y.txt",as.is=T,header=T,row.names=1)
>
> X.mat<- as.matrix(X)
> Y.mat<- as.matrix(Y)
>
> # calculating the true correlation values from my original dataset
> True.Corrs<- matrix()
> for (k in 1:nrow(SNP.mat)){
> True.Corrs[k]<- cor.test(X.mat[k,],Y.mat[k,],alternative
> =c("greater"),method= c("pearson"))$p.value
> }
>
> # Creating the random distribution of Correlation p-values
> X.rand <- list()
> Y.rand<- list()
>
> X.rand<-replicate(1000,(X[sample(1:ncol(Y))]),simplify=FALSE)  #
> Randomizing the column values for each row
> Y.rand<-replicate(1000,Y,simplify=FALSE) # Creating an equivalent list
> of the Y matrix (non-randomised), to be able to do a pair-wise  
> cor.test
>
> Corrs.rand<- list()
> tmp<- list()
> for (i in 1:2){
> for (j in 1:3){
> # How to store a multiple values per element of list?
> tmp[[j]] <- cor.test(t(X.rand[[i]][j,]),t(Y.rand[[i]][j,]),alternative
> =c("greater"),method= c("pearson"))$p.value
> Corrs.rand[[i]] <- rbind(Corrs.rand[[j]],tmp[[j]])
> }
> }
>
> = 
> = 
> ======================================================================
>
> At this step:
>
> for (i in 1:length(X.rand)){
> for (j in 1:nrow(X.rand[[1]]){
> # How to store a multiple values per element of list?
> tmp[[j]] <- cor.test(t(X.rand[[i]][j,]),t(Y.rand[[i]][j,]),alternative
> =c("greater"),method= c("pearson"))$p.value
> Corrs.rand[[i]] <- rbind(Corrs.rand[[j]],tmp[[j]])
> }
> }
>
> I am not sure how I can store multiple values per element.

The usual way would be to pre-allocate a matrix or a data.frame and  
then use "[<-" to assign either a  whole row at a time or assign  
individual elements one by one. rbind in a loop is definitely going to  
slow you down.

I haven't followed through the individual steps of all the for-loops.  
I guess I have lost the ability to think that way after learning to  
use the matrix and indexing features of R, alas. If Corrs.rand is a  
1000 x 12  data.frame (which is just a special square list) then you  
can assign the i-th row with:

Corrs.rand[i, ] <- <some-12-element-object>

Or you can assign the i,j-th element with:

Corrs.rand[i,j ] <- <vector-of-length-1>

The same syntax works for matrices.
-- 
David.

> For eg. I
> want a list of length 1000 (which is the number of random  
> permutations I
> have generated for my dataset) and in each element of the list I  
> need to
> store 12 p.values where 12 corresponds to the number of rows I have in
> my randomized dataset. Eg.
>
> [[1]]
> 0.23
> 0.05
> 0.78
> 0.78
> 0.87
> 0.11
> 0.003
> 0.9
> 0.76
> 0.11
> 0.23
> 0.56
> [[2]]
> 0.08
> 0.67
> 0.45
> 0.23
> 0.35
> 0.85
> 0.99
> 0.78
> 0.66
> 0.45
> 0.06
> 0.1
> [[3]]
> So on...
>
> I maybe going about this in a complicated way and there may be other
> ways of storing the p.values for each of my randomized dataset. So if
> anybody has ideas please oblige me.
> ======================================================
> X dataset:(sample)
> #Probes	X10851	X12144	X12155	X11882	X10860	X12762	X12239	X12154
> 1	1	1	0	0	1	0	2	0
> 2	0	0	0	0	0	0	0	0
> 3	2	2	2	2	1	2	1	2
> 4	0	0	0	0	0	0	0	0
> 5	2	2	2	2	2	2	2	2
> 6	0	1	0	0	1	1	1	1
> 7	2	2	NaN	2	2	2	2	2
> 8	2	2	2	2	2	2	2	2
> 9	0	1	0	1	1	NaN	1	2
> 10	2	2	2	2	2	2	2	2
> 11	2	0	0	0	0	0	0	0
> 12	0	1	0	1	1	0	1	1
>
>
> Y dataset:(sample)
>
> Probes	X10851	X12144	X12155	X11882	X10860	X12762	X12239	X12154
> 1	793.0830793	788.1813828	867.8504057	729.8321265
> 816.8519963	805.2113707	774.5990003	854.6384306
> 2	12.8695023	4.312894024	10.69769375	5.872212512
> 13.79299806	9.394132659	6.297552848	9.307943304
> 3	699.7791876	826.997429	795.6409729	770.9376141
> 806.1241089	782.3970486	817.107482	859.7154906
> 4	892.8217221	869.0481458	806.3386667	812.0431017
> 873.5565439	794.4752191	813.9587056	814.8681274
> 5	892.8217221	869.0481458	806.3386667	812.0431017
> 873.5565439	794.4752191	813.9587056	814.8681274
> 6	839.7350251	943.4455677	950.7575323	859.0208018
> 894.246041	853.524053	941.4841508	913.0246205
> 7	653.1272418	751.5217836	750.1757745	737.382114
> 757.8486157	758.2407075	724.2185775	770.8669409
> 8	12.8695023	4.312894024	10.69769375	5.872212512
> 13.79299806	9.394132659	6.297552848	9.307943304
> 9	839.7350251	943.4455677	950.7575323	859.0208018
> 894.246041	853.524053	941.4841508	913.0246205
> 10	653.1272418	751.5217836	750.1757745	737.382114
> 757.8486157	758.2407075	724.2185775	770.8669409
> 11	653.1272418	751.5217836	750.1757745	737.382114
> 757.8486157	758.2407075	724.2185775	770.8669409
> 12	839.7350251	943.4455677	950.7575323	859.0208018
> 894.246041	853.524053	941.4841508	913.0246205
>
> Thanks again
>
> Manisha
>
>
>
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Tuesday, January 18, 2011 11:56 PM
> To: Brahmachary, Manisha
> Cc: R-help at r-project.org
> Subject: Re: [R] Pearson correlation with randomization
>
>
> On Jan 18, 2011, at 11:23 PM, Brahmachary, Manisha wrote:
>
>> Hello,
>>
>>
>>
>> I will be very obliged if someone can help me with this statistical R
>> problem:
>>
>> I am trying to do a Pearson correlation on my datasets X, Y with
>> randomization test. My X and Y datasets are pairs.
>>
>> 1.     I want to randomize (rearrange) only my X dataset per
>> row ,while
>> keeping the my Y dataset as it is.
>
> X <- X[sample(1:nrow(Y)), ]
>
>>
>> 2.     Then Calculate the correlation  for this pair, and compare it
>> to
>> your true value of correlation.
>>
>> 3.     Repeat 2 and 3 maybe a 100 times
>
> You may want to look at the replicate function.
>
>>
>> 4.     If your true p-value  is greater than 95% of the random  
>> values,
>> then you can reject the null hypothesis at   p<0.05.
>
> You won't have a very stable estimate of the 95th order statistics
> with "maybe" 100 replications.
>
> -- 
> David.
>>
>>
>>
>> I am stuck at the randomization step. I need some help in  
>> implementing
>> it the appropriate randomization step in my correlation.
>>
>> Below is my incomplete code. I will be very obliged if someone could
>> help:
>>
>>
>>
>> X <- read.table("X.txt",as.is=T,header=T,row.names=1)
>>
>> Y <- read.table("Y.txt",as.is=T,header=T,row.names=1)
>>
>>
>>
>> X.mat<- as.matrix(X)
>>
>> Y.mat<- as.matrix(Y)
>>
>>
>>
>> Corrs<- cor.test(X.mat[1,],Y.mat[1,],alternative  
>> =c("greater"),method=
>> c("pearson"))
>>
>>
>>
>> Corrs.rand <- list()
>>
>>
>>
>> for (i in 1:length(X.mat)){
>>
>> for (j in 1:100){
>>
>>
>>
>> # This doesnot seem to wrok correctly. How do I run sample function
>> 100
>> times for the same row?
>>
>>
>>
>> SNP.rand<- sample(SNP.mat[i,],56, replace = FALSE, prob = NULL)
>>
>> Corrs.rand[[j]]<- cor.test(SNP.rand,CNV.mat[j,],alternative
>> =c("greater"),method= c("pearson"))
>>
>>
>>
>> # need to calculate how many times my pvalue from true p-value>  
>> random
>> pvalue
>>
>> }
>>
>> }
>>
>>
>>
>> X dataset:
>>
>>
>>
>> #Probes
>>
>> X10851
>>
>> X12144
>>
>> X12155
>>
>> X11882
>>
>> X10860
>>
>> X12762
>>
>> X12239
>>
>> X12154
>>
>> 1
>>
>> 1
>>
>> 1
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 0
>>
>> 2
>>
>> 0
>>
>> 2
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 3
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 1
>>
>> 2
>>
>> 1
>>
>> 2
>>
>> 4
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 5
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 6
>>
>> 0
>>
>> 1
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 1
>>
>> 1
>>
>> 1
>>
>> 7
>>
>> 2
>>
>> 2
>>
>> NaN
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 8
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 9
>>
>> 0
>>
>> 1
>>
>> 0
>>
>> 1
>>
>> 1
>>
>> NaN
>>
>> 1
>>
>> 2
>>
>> 10
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 2
>>
>> 11
>>
>> 2
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 0
>>
>> 12
>>
>> 0
>>
>> 1
>>
>> 0
>>
>> 1
>>
>> 1
>>
>> 0
>>
>> 1
>>
>> 1
>>
>>
>>
>> Y dataset:
>>
>> Probes
>>
>> X10851
>>
>> X12144
>>
>> X12155
>>
>> X11882
>>
>> X10860
>>
>> X12762
>>
>> X12239
>>
>> X12154
>>
>> 1
>>
>> 793.0831
>>
>> 788.1814
>>
>> 867.8504
>>
>> 729.8321
>>
>> 816.852
>>
>> 805.2114
>>
>> 774.599
>>
>> 854.6384
>>
>> 2
>>
>> 12.8695
>>
>> 4.312894
>>
>> 10.69769
>>
>> 5.872213
>>
>> 13.793
>>
>> 9.394133
>>
>> 6.297553
>>
>> 9.307943
>>
>> 3
>>
>> 699.7792
>>
>> 826.9974
>>
>> 795.641
>>
>> 770.9376
>>
>> 806.1241
>>
>> 782.397
>>
>> 817.1075
>>
>> 859.7155
>>
>> 4
>>
>> 892.8217
>>
>> 869.0481
>>
>> 806.3387
>>
>> 812.0431
>>
>> 873.5565
>>
>> 794.4752
>>
>> 813.9587
>>
>> 814.8681
>>
>> 5
>>
>> 892.8217
>>
>> 869.0481
>>
>> 806.3387
>>
>> 812.0431
>>
>> 873.5565
>>
>> 794.4752
>>
>> 813.9587
>>
>> 814.8681
>>
>> 6
>>
>> 839.735
>>
>> 943.4456
>>
>> 950.7575
>>
>> 859.0208
>>
>> 894.246
>>
>> 853.5241
>>
>> 941.4842
>>
>> 913.0246
>>
>> 7
>>
>> 653.1272
>>
>> 751.5218
>>
>> 750.1758
>>
>> 737.3821
>>
>> 757.8486
>>
>> 758.2407
>>
>> 724.2186
>>
>> 770.8669
>>
>> 8
>>
>> 12.8695
>>
>> 4.312894
>>
>> 10.69769
>>
>> 5.872213
>>
>> 13.793
>>
>> 9.394133
>>
>> 6.297553
>>
>> 9.307943
>>
>> 9
>>
>> 839.735
>>
>> 943.4456
>>
>> 950.7575
>>
>> 859.0208
>>
>> 894.246
>>
>> 853.5241
>>
>> 941.4842
>>
>> 913.0246
>>
>> 10
>>
>> 653.1272
>>
>> 751.5218
>>
>> 750.1758
>>
>> 737.3821
>>
>> 757.8486
>>
>> 758.2407
>>
>> 724.2186
>>
>> 770.8669
>>
>> 11
>>
>> 653.1272
>>
>> 751.5218
>>
>> 750.1758
>>
>> 737.3821
>>
>> 757.8486
>>
>> 758.2407
>>
>> 724.2186
>>
>> 770.8669
>>
>> 12
>>
>> 839.735
>>
>> 943.4456
>>
>> 950.7575
>>
>> 859.0208
>>
>> 894.246
>>
>> 853.5241
>>
>> 941.4842
>>
>> 913.0246
>>
>>
>>
>>
>>
>>
>>
>> Thanks in advance
>>
>>
>>
>> Manisha
>>
>>
>>
>> Mount Sinai School of Medicine
>>
>> Icahn Medical Institute,
>>
>> 1425 Madison Avenue, Box 1498
>>
>> NY-10029, NEW-YORK, USA
>>
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list