[BioC] duplicateCorrelation and design matrix

Gordon K Smyth smyth at wehi.EDU.AU
Mon Jul 4 12:59:07 CEST 2005


> Date: Sun, 03 Jul 2005 10:13:29 +0000
> From: Carolyn Fitzsimmons <Carolyn.Fitzsimmons at imbim.uu.se>
> Subject: Re: [BioC] duplicateCorrelation and design matrix
> To: bioconductor at stat.math.ethz.ch
>
> Hi Gordon, thanks for your reply. I have a few more questions:
>
> Quoting Gordon K Smyth <smyth at wehi.EDU.AU>:
>
>> > Date: Thu, 30 Jun 2005 11:44:02 +0000
>> > From: Carolyn Fitzsimmons <Carolyn.Fitzsimmons at imbim.uu.se>
>> > Subject: [BioC] duplicateCorrelation and design matrix
>> > To: Bioconductor list <bioconductor at stat.math.ethz.ch>
>> >
>> > Hello,
>> >
>> > I need an explanation of how the design matrix influences the consensus
>> > correlation of the duplicateCorrelation function when accounting for
>> technical
>> > replicates.  Here is my specific example:
>> >
>> > Design matrix:
>> >> design
>> >    RJf RJm WLf WLm
>> > 1    0   0   0   1
>> > 2    0   0   0   1
>> > 3    0   0   0   1
>> > 4    0   0   0   1
>> > 5    0   0   0   1
>> > 6    0   0   0   1
>> > 7    0   0   0   1
>> > 8    0   0   0   1
>> > 9    0   0   1   0
>> > 10   0   0   1   0
>> > 11   0   0   1   0
>> > 12   0   0   1   0
>> > 13   0   0   1   0
>> > 14   0   0   1   0
>> > 15   0   0   1   0
>> > 16   0   0   1   0
>> > 17   0   1   0   0
>> > 18   0   1   0   0
>> > 19   0   1   0   0
>> > 20   0   1   0   0
>> > 21   0   1   0   0
>> > 22   0   1   0   0
>> > 23   0   1   0   0
>> > 24   0   1   0   0
>> > 25   1   0   0   0
>> > 26   1   0   0   0
>> > 27   1   0   0   0
>> > 28   1   0   0   0
>> > 29   1   0   0   0
>> > 30   1   0   0   0
>> > 31   1   0   0   0
>> > 32   1   0   0   0
>> > #
>> > each second slide is a replicate of the first (eg. 1 and 2 are replicates,
>> then
>> > 3 and 4,... etc.).  There are also 4 groups that I want to compare, with 4
>> > individuals in each group (each duplicated).  So I continue with the
>> > duplicateCorrelation:
>> > #
>> >> cor <- duplicateCorrelation(Mmatrix_ny, design=design,
>> > +
>> >
>>
> block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16))
>> >> cor$cor
>> > [1] -0.03060575
>> > #
>> > which is a pretty bad correlation so I probably should just use the
>> technical
>> > replicates as biological replicates (the limma user guide says).  But in
>> > another comparison I want to put all the arrays in 2 groups, see design
>> > matrix:
>> >> designWLRJ
>> >    RJ WL
>> > 1   0  1
>> > 2   0  1
>> > 3   0  1
>> > 4   0  1
>> > 5   0  1
>> > 6   0  1
>> > 7   0  1
>> > 8   0  1
>> > 9   0  1
>> > 10  0  1
>> > 11  0  1
>> > 12  0  1
>> > 13  0  1
>> > 14  0  1
>> > 15  0  1
>> > 16  0  1
>> > 17  1  0
>> > 18  1  0
>> > 19  1  0
>> > 20  1  0
>> > 21  1  0
>> > 22  1  0
>> > 23  1  0
>> > 24  1  0
>> > 25  1  0
>> > 26  1  0
>> > 27  1  0
>> > 28  1  0
>> > 29  1  0
>> > 30  1  0
>> > 31  1  0
>> > 32  1  0
>> > #
>> > and then do the duplicateCorrelation function and get a different
>> correlation.
>> > #
>> >> corWLRJ <- duplicateCorrelation (Mmatrix_ny, design=designWLRJ,
>> > +
>> >
>>
> block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16))
>> >> corWLRJ$cor
>> > [1] 0.01745252
>> > #
>> > Moreover when I compute the consensus correlation without using a design
>> matrix
>> > I get 0.1073055.  I know from looking through previous posts and a lot of
>> help
>> > from Johan L. that the way the blocking is set up and using the design
>> matrix
>> > in these situations is correct.
>>
>> You've used three different non-equivalent design matrices.  No more than one
>> of these can be
>> correct.
>
> But if I need to group the individuals differently to test for differential
> expression between different groupings of individuals (i.e. between
> WLm/WLf/RJm/RJf and WL/RJ), the use of 2 different design matrixies in the
> dupCorrelation function is warrented, yes?

No.  Unless you have a good reason to do otherwise, set the full design matrix and use
contrasts.fit() to group the individuals for differential expression tests.

Gordon

>
>>
>> > So how is the consensus correlation actually
>> > being calculated in the above situations? (in loose mathamatical terms if
>> > possible, as you can probably tell from my question).
>>
>> In loose terms the correlation measures the variability between blocks
>> relative to the variation
>> within blocks.  Over-simplifying the design matrix will increase the
>> between-blocks variation,
>> because it will now reflect differences between your treatments as well as
>> differences between
>> biological replicates.  Hence the estimated correlation increases.
>>
>
> Okay. Now I believe I understand how it is calculated. When you use a design
> matrix here you create blocks, then the blocking argument creates blocks within
> blocks. (Correct me if this is wrong).
>
> Best Regards,  Carolyn



More information about the Bioconductor mailing list