[R] Pearson correlation with randomization
Brahmachary, Manisha
manisha.brahmachary at mssm.edu
Fri Jan 21 21:29:54 CET 2011
Hi David,
Thanks a lot for you inputs. I have modified my code accordingly. There
is one more place that I need some help.
This is my code:
========================================================================
======
X<- read.table("X.txt",as.is=T,header=T,row.names=1)
Y<- read.table("Y.txt",as.is=T,header=T,row.names=1)
X.mat<- as.matrix(X)
Y.mat<- as.matrix(Y)
# calculating the true correlation values from my original dataset
True.Corrs<- matrix()
for (k in 1:nrow(SNP.mat)){
True.Corrs[k]<- cor.test(X.mat[k,],Y.mat[k,],alternative
=c("greater"),method= c("pearson"))$p.value
}
# Creating the random distribution of Correlation p-values
X.rand <- list()
Y.rand<- list()
X.rand<-replicate(1000,(X[sample(1:ncol(Y))]),simplify=FALSE) #
Randomizing the column values for each row
Y.rand<-replicate(1000,Y,simplify=FALSE) # Creating an equivalent list
of the Y matrix (non-randomised), to be able to do a pair-wise cor.test
Corrs.rand<- list()
tmp<- list()
for (i in 1:2){
for (j in 1:3){
# How to store a multiple values per element of list?
tmp[[j]] <- cor.test(t(X.rand[[i]][j,]),t(Y.rand[[i]][j,]),alternative
=c("greater"),method= c("pearson"))$p.value
Corrs.rand[[i]] <- rbind(Corrs.rand[[j]],tmp[[j]])
}
}
========================================================================
At this step:
for (i in 1:length(X.rand)){
for (j in 1:nrow(X.rand[[1]]){
# How to store a multiple values per element of list?
tmp[[j]] <- cor.test(t(X.rand[[i]][j,]),t(Y.rand[[i]][j,]),alternative
=c("greater"),method= c("pearson"))$p.value
Corrs.rand[[i]] <- rbind(Corrs.rand[[j]],tmp[[j]])
}
}
I am not sure how I can store multiple values per element. For eg. I
want a list of length 1000 (which is the number of random permutations I
have generated for my dataset) and in each element of the list I need to
store 12 p.values where 12 corresponds to the number of rows I have in
my randomized dataset. Eg.
[[1]]
0.23
0.05
0.78
0.78
0.87
0.11
0.003
0.9
0.76
0.11
0.23
0.56
[[2]]
0.08
0.67
0.45
0.23
0.35
0.85
0.99
0.78
0.66
0.45
0.06
0.1
[[3]]
So on...
I maybe going about this in a complicated way and there may be other
ways of storing the p.values for each of my randomized dataset. So if
anybody has ideas please oblige me.
======================================================
X dataset:(sample)
#Probes X10851 X12144 X12155 X11882 X10860 X12762 X12239 X12154
1 1 1 0 0 1 0 2 0
2 0 0 0 0 0 0 0 0
3 2 2 2 2 1 2 1 2
4 0 0 0 0 0 0 0 0
5 2 2 2 2 2 2 2 2
6 0 1 0 0 1 1 1 1
7 2 2 NaN 2 2 2 2 2
8 2 2 2 2 2 2 2 2
9 0 1 0 1 1 NaN 1 2
10 2 2 2 2 2 2 2 2
11 2 0 0 0 0 0 0 0
12 0 1 0 1 1 0 1 1
Y dataset:(sample)
Probes X10851 X12144 X12155 X11882 X10860 X12762 X12239 X12154
1 793.0830793 788.1813828 867.8504057 729.8321265
816.8519963 805.2113707 774.5990003 854.6384306
2 12.8695023 4.312894024 10.69769375 5.872212512
13.79299806 9.394132659 6.297552848 9.307943304
3 699.7791876 826.997429 795.6409729 770.9376141
806.1241089 782.3970486 817.107482 859.7154906
4 892.8217221 869.0481458 806.3386667 812.0431017
873.5565439 794.4752191 813.9587056 814.8681274
5 892.8217221 869.0481458 806.3386667 812.0431017
873.5565439 794.4752191 813.9587056 814.8681274
6 839.7350251 943.4455677 950.7575323 859.0208018
894.246041 853.524053 941.4841508 913.0246205
7 653.1272418 751.5217836 750.1757745 737.382114
757.8486157 758.2407075 724.2185775 770.8669409
8 12.8695023 4.312894024 10.69769375 5.872212512
13.79299806 9.394132659 6.297552848 9.307943304
9 839.7350251 943.4455677 950.7575323 859.0208018
894.246041 853.524053 941.4841508 913.0246205
10 653.1272418 751.5217836 750.1757745 737.382114
757.8486157 758.2407075 724.2185775 770.8669409
11 653.1272418 751.5217836 750.1757745 737.382114
757.8486157 758.2407075 724.2185775 770.8669409
12 839.7350251 943.4455677 950.7575323 859.0208018
894.246041 853.524053 941.4841508 913.0246205
Thanks again
Manisha
-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net]
Sent: Tuesday, January 18, 2011 11:56 PM
To: Brahmachary, Manisha
Cc: R-help at r-project.org
Subject: Re: [R] Pearson correlation with randomization
On Jan 18, 2011, at 11:23 PM, Brahmachary, Manisha wrote:
> Hello,
>
>
>
> I will be very obliged if someone can help me with this statistical R
> problem:
>
> I am trying to do a Pearson correlation on my datasets X, Y with
> randomization test. My X and Y datasets are pairs.
>
> 1. I want to randomize (rearrange) only my X dataset per
> row ,while
> keeping the my Y dataset as it is.
X <- X[sample(1:nrow(Y)), ]
>
> 2. Then Calculate the correlation for this pair, and compare it
> to
> your true value of correlation.
>
> 3. Repeat 2 and 3 maybe a 100 times
You may want to look at the replicate function.
>
> 4. If your true p-value is greater than 95% of the random values,
> then you can reject the null hypothesis at p<0.05.
You won't have a very stable estimate of the 95th order statistics
with "maybe" 100 replications.
--
David.
>
>
>
> I am stuck at the randomization step. I need some help in implementing
> it the appropriate randomization step in my correlation.
>
> Below is my incomplete code. I will be very obliged if someone could
> help:
>
>
>
> X <- read.table("X.txt",as.is=T,header=T,row.names=1)
>
> Y <- read.table("Y.txt",as.is=T,header=T,row.names=1)
>
>
>
> X.mat<- as.matrix(X)
>
> Y.mat<- as.matrix(Y)
>
>
>
> Corrs<- cor.test(X.mat[1,],Y.mat[1,],alternative =c("greater"),method=
> c("pearson"))
>
>
>
> Corrs.rand <- list()
>
>
>
> for (i in 1:length(X.mat)){
>
> for (j in 1:100){
>
>
>
> # This doesnot seem to wrok correctly. How do I run sample function
> 100
> times for the same row?
>
>
>
> SNP.rand<- sample(SNP.mat[i,],56, replace = FALSE, prob = NULL)
>
> Corrs.rand[[j]]<- cor.test(SNP.rand,CNV.mat[j,],alternative
> =c("greater"),method= c("pearson"))
>
>
>
> # need to calculate how many times my pvalue from true p-value> random
> pvalue
>
> }
>
> }
>
>
>
> X dataset:
>
>
>
> #Probes
>
> X10851
>
> X12144
>
> X12155
>
> X11882
>
> X10860
>
> X12762
>
> X12239
>
> X12154
>
> 1
>
> 1
>
> 1
>
> 0
>
> 0
>
> 1
>
> 0
>
> 2
>
> 0
>
> 2
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 3
>
> 2
>
> 2
>
> 2
>
> 2
>
> 1
>
> 2
>
> 1
>
> 2
>
> 4
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 5
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 6
>
> 0
>
> 1
>
> 0
>
> 0
>
> 1
>
> 1
>
> 1
>
> 1
>
> 7
>
> 2
>
> 2
>
> NaN
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 8
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 9
>
> 0
>
> 1
>
> 0
>
> 1
>
> 1
>
> NaN
>
> 1
>
> 2
>
> 10
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 2
>
> 11
>
> 2
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0
>
> 12
>
> 0
>
> 1
>
> 0
>
> 1
>
> 1
>
> 0
>
> 1
>
> 1
>
>
>
> Y dataset:
>
> Probes
>
> X10851
>
> X12144
>
> X12155
>
> X11882
>
> X10860
>
> X12762
>
> X12239
>
> X12154
>
> 1
>
> 793.0831
>
> 788.1814
>
> 867.8504
>
> 729.8321
>
> 816.852
>
> 805.2114
>
> 774.599
>
> 854.6384
>
> 2
>
> 12.8695
>
> 4.312894
>
> 10.69769
>
> 5.872213
>
> 13.793
>
> 9.394133
>
> 6.297553
>
> 9.307943
>
> 3
>
> 699.7792
>
> 826.9974
>
> 795.641
>
> 770.9376
>
> 806.1241
>
> 782.397
>
> 817.1075
>
> 859.7155
>
> 4
>
> 892.8217
>
> 869.0481
>
> 806.3387
>
> 812.0431
>
> 873.5565
>
> 794.4752
>
> 813.9587
>
> 814.8681
>
> 5
>
> 892.8217
>
> 869.0481
>
> 806.3387
>
> 812.0431
>
> 873.5565
>
> 794.4752
>
> 813.9587
>
> 814.8681
>
> 6
>
> 839.735
>
> 943.4456
>
> 950.7575
>
> 859.0208
>
> 894.246
>
> 853.5241
>
> 941.4842
>
> 913.0246
>
> 7
>
> 653.1272
>
> 751.5218
>
> 750.1758
>
> 737.3821
>
> 757.8486
>
> 758.2407
>
> 724.2186
>
> 770.8669
>
> 8
>
> 12.8695
>
> 4.312894
>
> 10.69769
>
> 5.872213
>
> 13.793
>
> 9.394133
>
> 6.297553
>
> 9.307943
>
> 9
>
> 839.735
>
> 943.4456
>
> 950.7575
>
> 859.0208
>
> 894.246
>
> 853.5241
>
> 941.4842
>
> 913.0246
>
> 10
>
> 653.1272
>
> 751.5218
>
> 750.1758
>
> 737.3821
>
> 757.8486
>
> 758.2407
>
> 724.2186
>
> 770.8669
>
> 11
>
> 653.1272
>
> 751.5218
>
> 750.1758
>
> 737.3821
>
> 757.8486
>
> 758.2407
>
> 724.2186
>
> 770.8669
>
> 12
>
> 839.735
>
> 943.4456
>
> 950.7575
>
> 859.0208
>
> 894.246
>
> 853.5241
>
> 941.4842
>
> 913.0246
>
>
>
>
>
>
>
> Thanks in advance
>
>
>
> Manisha
>
>
>
> Mount Sinai School of Medicine
>
> Icahn Medical Institute,
>
> 1425 Madison Avenue, Box 1498
>
> NY-10029, NEW-YORK, USA
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list