[BioC] genereating correlation matrix from gene expression data

Robert Castelo robert.castelo at upf.edu
Mon Dec 7 21:50:41 CET 2009


hi Pankaj,

> Thanks for the previous suggestion. I am wondering how to write only the  
> $R values to a output matrix? Here we have both $R and $P values.

let's say your call was:

myPCC <- qpPCC(x)

then you can either handle myPCC$R as a matrix or assign it to another  
variable

r <- myPCC$R

and then r will be your matrix with Pearson correlation coefficients. if
what you're looking for is to write the matrix to a flat file then you
should check out the function called 'write.table'.

cheers,
robert.

>
> Regards,
> Pankaj Barah Department of Biology,
>  Norwegian University of Science & Technology (NTNU)
>  Realfagbygget, N-7491 Trondheim, Norway
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --- On Thu, 19/11/09, Robert Castelo <robert.castelo at upf.edu> wrote:
>
> From: Robert Castelo <robert.castelo at upf.edu>
> Subject: Re: [BioC] genereating correlation matrix from gene expression  
> data
> To: "pankaj borah" <pankajborah2k3 at yahoo.co.in>
> Cc: bioconductor at stat.math.ethz.ch
> Date: Thursday, 19 November, 2009, 2:33 PM
>
> Pankaj,
>
> in addition to what Steve provided below, if you are dealing
>  with
> expression data, it might be useful for you the function qpPCC from the
> bioconductor package 'qpgraph'. it performs the same calculation as the
> 'cor' function illustrated in Steve's message but in addition it accepts
> as input parameter not only a matrix or a data frame but also an
> ExpressionSet object (not a big deal) and, more interestingly, it
> calculates for you the P-values of a two sided test for the null
> hypothesis of zero Pearson correlation. this latter feature might be
> useful since as far as i know if one wants to test every pair of genes
> and get the P-values one needs to call 'cor.test' for each pair of genes
> within a two nested for loops or some 'apply' call and this function
> 'qpPCC' does it using matrix calculations and thus is faster for that
> purpose. of course, if you are interested in determining significance
> through the P-values you will have to deal with the multiple
>  testing
> problem.
>
> library(qpgraph)
>
> set.seed(123)
> x <- matrix(rnorm(20), 4, 5)
>
> round(x, digits=3)
>        [,1]   [,2]   [,3]   [,4]   [,5]
> [1,] -0.560  0.129 -0.687  0.401  0.498
> [2,] -0.230  1.715 -0.446  0.111 -1.967
> [3,]  1.559  0.461  1.224 -0.556  0.701
> [4,]  0.071 -1.265  0.360  1.787 -0.473
>
> lapply(qpPCC(x), round, digits=3)
> $R
>        1      2      3      4      5
> 1  1.000 -0.016  0.958 -0.490  0.437
> 2 -0.016  1.000 -0.271 -0.753 -0.462
> 3  0.958 -0.271  1.000 -0.218  0.431
> 4 -0.490 -0.753 -0.218  1.000 -0.198
> 5  0.437 -0.462  0.431 -0.198  1.000
>
> $P
>       1 
>     2     3     4     5
> 1 0.000 0.984 0.042 0.510 0.563
> 2 0.984 0.000 0.729 0.247 0.538
> 3 0.042 0.729 0.000 0.782 0.569
> 4 0.510 0.247 0.782 0.000 0.802
> 5 0.563 0.538 0.569 0.802 0.000
>
> cor.test(x[,1], x[,2])
>
>         Pearson's product-moment correlation
>
> data:  x[, 1] and x[, 2]
> t = -0.0231, df = 2, p-value = 0.9837
> alternative hypothesis: true correlation is not equal to 0
> 95 percent confidence interval:
>  -0.962  0.960
> sample estimates:
>     cor
> -0.0163
>
> as a final shameless remark :) if you are interested in trying to sort
> out genuine from spurious correlations you might want to calculate the
> so-called "partial correlations" (see
> http://en.wikipedia.org/wiki/Partial_correlation) which in
>  fact one
> cannot calculate with gene expression data because typically you will
> many more genes than samples. instead, as a sort of surrogate for
> partial correlations, you might find useful to calculate the so-called
> non-rejection rates through the functions 'qpNrr' or 'qpAvgNrr' from the
> 'qpgraph' package (look at the corresponding help pages if you're
> interested).
>
> best,
> robert.
>
> On Wed, 2009-11-18 at 14:05 -0500, Steve Lianoglou wrote:
>> Hi Pankaj,
>>
>> On Nov 18, 2009, at 1:30 PM, pankaj borah wrote:
>>
>> > Hi all,
>> >
>> > i am a new user of R and Bioconductor package. i deal with gene 
>> > expression data. i was wondering if there is a way to generate a 
>> > correlation matrix (all to all square symmetric matrix) for  a set 
>> > of genes and their expression values. is there any available function?
>>
>>
>  Try ?cor to get the pairwise correlation between columns of your 
>> matrix, eg:
>>
>> R> x <- matrix(rnorm(20), 4, 5)
>> R> x
>>              [,1]        [,2]       [,3]       [,4]       [,5]
>> [1,]  0.5818808  0.58781709  2.5511157 -1.5332180  0.6905731
>> [2,] -0.7640311  0.25960974  0.7246655 -1.5539226 -0.5459625
>> [3,] -1.7141619 -0.25808091 -0.0868366 -0.6547804  0.4629494
>> [4,] -2.2906217  0.04932864  0.5694895  0.7736206  0.4078665
>>
>> R> cor(x)
>>               [,1]         [,2]       [,3]       [,4]        [,5]
>> [1,] 
>  1.00000000  0.851982381  0.8715055 -0.8404848 0.074770380
>> [2,]  0.85198238  1.000000000  0.9429553 -0.5338964 0.004627258
>> [3,]  0.87150545  0.942955253  1.0000000 -0.4719936 0.325584972
>> [4,] -0.84048477 -0.533896414 -0.4719936  1.0000000 0.309414781
>> [5,]  0.07477038  0.004627258  0.3255850  0.3094148 1.000000000
>>
>> If your genes are in rows, you'll just need to pass in the transpose 
>> of your matrix.
>>
>> HTH,
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>     |  Memorial Sloan-Kettering Cancer Center
>>     |  Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>>
>  _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:  
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
>
>
>
> The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
>
>
>       The INTERNET now has a personality. YOURS! See your Yahoo!  
> Homepage. http://in.yahoo.com/



More information about the Bioconductor mailing list