[BioC] genereating correlation matrix from gene expression data
Robert Castelo
robert.castelo at upf.edu
Mon Dec 7 21:50:41 CET 2009
hi Pankaj,
> Thanks for the previous suggestion. I am wondering how to write only the
> $R values to a output matrix? Here we have both $R and $P values.
let's say your call was:
myPCC <- qpPCC(x)
then you can either handle myPCC$R as a matrix or assign it to another
variable
r <- myPCC$R
and then r will be your matrix with Pearson correlation coefficients. if
what you're looking for is to write the matrix to a flat file then you
should check out the function called 'write.table'.
cheers,
robert.
>
> Regards,
> Pankaj Barah Department of Biology,
> Norwegian University of Science & Technology (NTNU)
> Realfagbygget, N-7491 Trondheim, Norway
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --- On Thu, 19/11/09, Robert Castelo <robert.castelo at upf.edu> wrote:
>
> From: Robert Castelo <robert.castelo at upf.edu>
> Subject: Re: [BioC] genereating correlation matrix from gene expression
> data
> To: "pankaj borah" <pankajborah2k3 at yahoo.co.in>
> Cc: bioconductor at stat.math.ethz.ch
> Date: Thursday, 19 November, 2009, 2:33 PM
>
> Pankaj,
>
> in addition to what Steve provided below, if you are dealing
> with
> expression data, it might be useful for you the function qpPCC from the
> bioconductor package 'qpgraph'. it performs the same calculation as the
> 'cor' function illustrated in Steve's message but in addition it accepts
> as input parameter not only a matrix or a data frame but also an
> ExpressionSet object (not a big deal) and, more interestingly, it
> calculates for you the P-values of a two sided test for the null
> hypothesis of zero Pearson correlation. this latter feature might be
> useful since as far as i know if one wants to test every pair of genes
> and get the P-values one needs to call 'cor.test' for each pair of genes
> within a two nested for loops or some 'apply' call and this function
> 'qpPCC' does it using matrix calculations and thus is faster for that
> purpose. of course, if you are interested in determining significance
> through the P-values you will have to deal with the multiple
> testing
> problem.
>
> library(qpgraph)
>
> set.seed(123)
> x <- matrix(rnorm(20), 4, 5)
>
> round(x, digits=3)
> [,1] [,2] [,3] [,4] [,5]
> [1,] -0.560 0.129 -0.687 0.401 0.498
> [2,] -0.230 1.715 -0.446 0.111 -1.967
> [3,] 1.559 0.461 1.224 -0.556 0.701
> [4,] 0.071 -1.265 0.360 1.787 -0.473
>
> lapply(qpPCC(x), round, digits=3)
> $R
> 1 2 3 4 5
> 1 1.000 -0.016 0.958 -0.490 0.437
> 2 -0.016 1.000 -0.271 -0.753 -0.462
> 3 0.958 -0.271 1.000 -0.218 0.431
> 4 -0.490 -0.753 -0.218 1.000 -0.198
> 5 0.437 -0.462 0.431 -0.198 1.000
>
> $P
> 1
> 2 3 4 5
> 1 0.000 0.984 0.042 0.510 0.563
> 2 0.984 0.000 0.729 0.247 0.538
> 3 0.042 0.729 0.000 0.782 0.569
> 4 0.510 0.247 0.782 0.000 0.802
> 5 0.563 0.538 0.569 0.802 0.000
>
> cor.test(x[,1], x[,2])
>
> Pearson's product-moment correlation
>
> data: x[, 1] and x[, 2]
> t = -0.0231, df = 2, p-value = 0.9837
> alternative hypothesis: true correlation is not equal to 0
> 95 percent confidence interval:
> -0.962 0.960
> sample estimates:
> cor
> -0.0163
>
> as a final shameless remark :) if you are interested in trying to sort
> out genuine from spurious correlations you might want to calculate the
> so-called "partial correlations" (see
> http://en.wikipedia.org/wiki/Partial_correlation) which in
> fact one
> cannot calculate with gene expression data because typically you will
> many more genes than samples. instead, as a sort of surrogate for
> partial correlations, you might find useful to calculate the so-called
> non-rejection rates through the functions 'qpNrr' or 'qpAvgNrr' from the
> 'qpgraph' package (look at the corresponding help pages if you're
> interested).
>
> best,
> robert.
>
> On Wed, 2009-11-18 at 14:05 -0500, Steve Lianoglou wrote:
>> Hi Pankaj,
>>
>> On Nov 18, 2009, at 1:30 PM, pankaj borah wrote:
>>
>> > Hi all,
>> >
>> > i am a new user of R and Bioconductor package. i deal with gene
>> > expression data. i was wondering if there is a way to generate a
>> > correlation matrix (all to all square symmetric matrix) for a set
>> > of genes and their expression values. is there any available function?
>>
>>
> Try ?cor to get the pairwise correlation between columns of your
>> matrix, eg:
>>
>> R> x <- matrix(rnorm(20), 4, 5)
>> R> x
>> [,1] [,2] [,3] [,4] [,5]
>> [1,] 0.5818808 0.58781709 2.5511157 -1.5332180 0.6905731
>> [2,] -0.7640311 0.25960974 0.7246655 -1.5539226 -0.5459625
>> [3,] -1.7141619 -0.25808091 -0.0868366 -0.6547804 0.4629494
>> [4,] -2.2906217 0.04932864 0.5694895 0.7736206 0.4078665
>>
>> R> cor(x)
>> [,1] [,2] [,3] [,4] [,5]
>> [1,]
> 1.00000000 0.851982381 0.8715055 -0.8404848 0.074770380
>> [2,] 0.85198238 1.000000000 0.9429553 -0.5338964 0.004627258
>> [3,] 0.87150545 0.942955253 1.0000000 -0.4719936 0.325584972
>> [4,] -0.84048477 -0.533896414 -0.4719936 1.0000000 0.309414781
>> [5,] 0.07477038 0.004627258 0.3255850 0.3094148 1.000000000
>>
>> If your genes are in rows, you'll just need to pass in the transpose
>> of your matrix.
>>
>> HTH,
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>>
> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
>
>
>
> The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
>
>
> The INTERNET now has a personality. YOURS! See your Yahoo!
> Homepage. http://in.yahoo.com/
More information about the Bioconductor
mailing list