[BioC] EdgeR: paired samples together with independant samples

Gordon K Smyth smyth at wehi.EDU.AU
Wed Nov 7 01:01:28 CET 2012


Dear Maria,

Thanks for the specific reference to the documentation that you've 
followed.

Yes, you are correct, the error is arising because there is no 4th patient 
in the healthy group.  If you have a look at your design matrix, you will 
see that there is a column called DiseaseHealthy:Patient4 that consists 
entirely of zeros.  It should be column 8, but check:

    design[,8]

The easiest way to proceed is simply to remove that column manually from 
the design matrix:

    design2 <- design[,-8]

Your experiment has another issue, in that you have repeat samples on 
several of the patients.  Are these biological replicates?  If not, if 
they are just technical replicates, then they should be collapsed into one 
library before analysis.

Best wishes
Gordon

> Date: Tue, 06 Nov 2012 09:19:08 +0000
> From: Maria Keays <mkeays at ebi.ac.uk>
> To: bioconductor at r-project.org
> Subject: Re: [BioC] EdgeR: paired samples together with independant
> 	samples
>
> Hello,
>
> I read this thread and related user guide material with interest because
> I am working with a very similar data set with paired samples. However,
> I'm having trouble which I think stems from my data being unbalanced? I
> have four patients with a disease and three without, and within that for
> some patients I have replicates but for others I do not. I've created a
> design matrix as described on p32 of the 27 October 2012 edgeR user's
> guide, but when I try to estimate the common dispersion using
> estimateGLMCommonDisp() it tells me:
>
> "Error in glmFit.default(y, design = design, dispersion = dispersion,
> offset = offset) :
>   Design matrix not of full rank.  The following coefficients not
> estimable:
>  DiseaseHealthy:Patient4"
>
> I guess because I have 4 patients in the diseased set and only 3 in the
> healthy set? If I remove Patient4 and try again, I'm able to continue
> the analysis successfully, but I'd obviously like to be able to include
> all the data -- is that possible? If so, could you explain how to do it?
>
> The original annotations for my data are below:
>
> Disease    Patient    Treatment
> disease1    1    control
> disease1    1    control
> disease1    1    control
> disease1    2    control
> disease1    3    control
> disease1    3    control
> disease1    4    control
> disease1    1    treat
> disease1    1    treat
> disease1    1    treat
> disease1    2    treat
> disease1    3    treat
> disease1    3    treat
> disease1    4    treat
> healthy    5    control
> healthy    6    control
> healthy    6    control
> healthy    6    control
> healthy    7    control
> healthy    7    control
> healthy    5    treat
> healthy    6    treat
> healthy    6    treat
> healthy    6    treat
> healthy    7    treat
> healthy    7    treat
>
> As I was following the user's guide I amended the "Patient" labels so it
> looked like this when I created the design matrix:
>
> Disease    Patient    Treatment
> disease1    1    control
> disease1    1    control
> disease1    1    control
> disease1    2    control
> disease1    3    control
> disease1    3    control
> disease1    4    control
> disease1    1    treat
> disease1    1    treat
> disease1    1    treat
> disease1    2    treat
> disease1    3    treat
> disease1    3    treat
> disease1    4    treat
> healthy    1    control
> healthy    2    control
> healthy    2    control
> healthy    2    control
> healthy    3    control
> healthy    3    control
> healthy    1    treat
> healthy    2    treat
> healthy    2    treat
> healthy    2    treat
> healthy    3    treat
> healthy    3    treat
>
> Thanks!
> Maria
>
>
> On 25/10/2012 06:18, Gordon K Smyth wrote:
>> Dear Anna,
>>
>> You are right to recognise that the analysis of this sort of design is
>> more complex than many other experiments, because it includes
>> comparisons both within and between patients.  I have included a new
>> section in the edgeR User's Guide based on your experiment that
>> describes the analysis. This will appear in the official release of
>> edgeR in a couple of days. In the meantime, see pages 31-33 of:
>>
>>   http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf
>>
>> Best wishes
>> Gordon
>>
>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT)
>>> From: "anna [guest]" <guest at bioconductor.org>
>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr
>>> Subject: [BioC] EdgeR: paired samples together with independant
>>>     samples
>>>
>>>
>>> Hello,
>>> I am using EdgeR to analyse my RNAseq data.
>>>
>>> I have:
>>>
>>> cells from 3 healthy patients , either treated or not with a hormone .
>>>
>>> cells from 3 patients with disease D1, either treated or not with the
>>> hormone
>>>
>>> cells from 3 patients with disease D2, either treated or not with the
>>> hormone.
>>>
>>> I would like to know what is wrong in the response to the hormone in
>>> patients with disease D1 and D2.
>>>
>>> I don't know how to combine paired comparisons, with pairwise
>>> comparisons, in a unique glm analysis.
>>>
>>> thank you very much,
>>> anna
>>>
>>> -- output of sessionInfo():
>>>
>>> R version 2.15.1 (2012-06-22)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252
>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
>>> [5] LC_TIME=French_France.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.15.1
>>>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list