[BioC] EdgeR: paired samples together with independant samples

Gordon K Smyth smyth at wehi.EDU.AU
Tue Nov 13 00:39:06 CET 2012


On Mon, 12 Nov 2012, Maria Keays wrote:

> Dear Gordon,
>
> I have another question about this analysis. Previously I performed an 
> analysis on the same data but without incorporating effects of patient. My 
> design matrix had columns: "Disease1.Treat", "Disease1.Control", 
> "Healthy.Treat", "Healthy.Control", and I then tested for genes showing a 
> significant interaction between disease and treatment using the contrast 
> ((Disease1.Treat - Disease1.Control) - (Healthy.Treat - Healthy.Control)). I 
> think this is what is explained on pages 25-26 of the edgeR users guide (Oct 
> 27 2012 version).
>
> Now I want to take into account patient effects as well, so I have my design 
> matrix with columns:
> [1] "(Intercept)"                  "DiseaseDisease1"
> [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2"
> [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3"
> [7] "DiseaseDisease1:Patient4" "DiseaseHealthy:TreatmentTreat"
> [9] "DiseaseDisease1:TreatmentTreat"
>
> Reading the explanation on pages 32-33 of the users guide, to do the 
> equivalent contrast to find genes showing significant interaction between 
> disease and treatment, should I simply use:
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,-1,1))  ?

Yes.

Gordon

> I think this is what the guide is saying, but I just want to make sure...
>
> Thanks and best wishes,
> Maria
>
>
> On 07/11/2012 22:55, Gordon K Smyth wrote:
>> Dear Maria,
>> 
>> Sounds ok from what you say not to collapse libraries.  However, if the 
>> three treated cultures and three untreated cultures for one patient are 
>> truly three pairs, then this pairing should be reflected in the analysis. 
>> You can handle this by numbering the samples by paired culture from 1 to 7 
>> instead of numbering by patient.
>> 
>> An MDS plot could guide you in judging whether there are baseline 
>> differences between the different pairs for one patient, and hence whether 
>> your pairing should be by culture instead of by patient.
>> 
>> Best wishes
>> Gordon
>> 
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> http://www.statsci.org/smyth
>> 
>> On Wed, 7 Nov 2012, Maria Keays wrote:
>> 
>>> Dear Gordon,
>>> 
>>> Thanks very much for the helpful advice. I'm treating them as biological 
>>> replicates -- they are cell cultures and it's just that I have multiple 
>>> separately treated/untreated pairs of cultures from some patients and only 
>>> one treated/untreated pair for others. So although some cultures came from 
>>> the same patient, they were all treated separately and then RNA was 
>>> extracted from each culture. Would you say that's the right thing to do?
>>> 
>>> Thanks and best wishes,
>>> Maria
>>> 
>>> 
>>> On 07/11/2012 00:01, Gordon K Smyth wrote:
>>>> Dear Maria,
>>>> 
>>>> Thanks for the specific reference to the documentation that you've 
>>>> followed.
>>>> 
>>>> Yes, you are correct, the error is arising because there is no 4th 
>>>> patient in the healthy group.  If you have a look at your design matrix, 
>>>> you will see that there is a column called DiseaseHealthy:Patient4 that 
>>>> consists entirely of zeros.  It should be column 8, but check:
>>>>
>>>>    design[,8]
>>>> 
>>>> The easiest way to proceed is simply to remove that column manually from 
>>>> the design matrix:
>>>>
>>>>    design2 <- design[,-8]
>>>> 
>>>> Your experiment has another issue, in that you have repeat samples on 
>>>> several of the patients.  Are these biological replicates?  If not, if 
>>>> they are just technical replicates, then they should be collapsed into 
>>>> one library before analysis.
>>>> 
>>>> Best wishes
>>>> Gordon
>>>> 
>>>>> Date: Tue, 06 Nov 2012 09:19:08 +0000
>>>>> From: Maria Keays <mkeays at ebi.ac.uk>
>>>>> To: bioconductor at r-project.org
>>>>> Subject: Re: [BioC] EdgeR: paired samples together with independant
>>>>>     samples
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I read this thread and related user guide material with interest because
>>>>> I am working with a very similar data set with paired samples. However,
>>>>> I'm having trouble which I think stems from my data being unbalanced? I
>>>>> have four patients with a disease and three without, and within that for
>>>>> some patients I have replicates but for others I do not. I've created a
>>>>> design matrix as described on p32 of the 27 October 2012 edgeR user's
>>>>> guide, but when I try to estimate the common dispersion using
>>>>> estimateGLMCommonDisp() it tells me:
>>>>> 
>>>>> "Error in glmFit.default(y, design = design, dispersion = dispersion,
>>>>> offset = offset) :
>>>>>   Design matrix not of full rank.  The following coefficients not
>>>>> estimable:
>>>>>  DiseaseHealthy:Patient4"
>>>>> 
>>>>> I guess because I have 4 patients in the diseased set and only 3 in the
>>>>> healthy set? If I remove Patient4 and try again, I'm able to continue
>>>>> the analysis successfully, but I'd obviously like to be able to include
>>>>> all the data -- is that possible? If so, could you explain how to do it?
>>>>> 
>>>>> The original annotations for my data are below:
>>>>> 
>>>>> Disease    Patient    Treatment
>>>>> disease1    1    control
>>>>> disease1    1    control
>>>>> disease1    1    control
>>>>> disease1    2    control
>>>>> disease1    3    control
>>>>> disease1    3    control
>>>>> disease1    4    control
>>>>> disease1    1    treat
>>>>> disease1    1    treat
>>>>> disease1    1    treat
>>>>> disease1    2    treat
>>>>> disease1    3    treat
>>>>> disease1    3    treat
>>>>> disease1    4    treat
>>>>> healthy    5    control
>>>>> healthy    6    control
>>>>> healthy    6    control
>>>>> healthy    6    control
>>>>> healthy    7    control
>>>>> healthy    7    control
>>>>> healthy    5    treat
>>>>> healthy    6    treat
>>>>> healthy    6    treat
>>>>> healthy    6    treat
>>>>> healthy    7    treat
>>>>> healthy    7    treat
>>>>> 
>>>>> As I was following the user's guide I amended the "Patient" labels so it
>>>>> looked like this when I created the design matrix:
>>>>> 
>>>>> Disease    Patient    Treatment
>>>>> disease1    1    control
>>>>> disease1    1    control
>>>>> disease1    1    control
>>>>> disease1    2    control
>>>>> disease1    3    control
>>>>> disease1    3    control
>>>>> disease1    4    control
>>>>> disease1    1    treat
>>>>> disease1    1    treat
>>>>> disease1    1    treat
>>>>> disease1    2    treat
>>>>> disease1    3    treat
>>>>> disease1    3    treat
>>>>> disease1    4    treat
>>>>> healthy    1    control
>>>>> healthy    2    control
>>>>> healthy    2    control
>>>>> healthy    2    control
>>>>> healthy    3    control
>>>>> healthy    3    control
>>>>> healthy    1    treat
>>>>> healthy    2    treat
>>>>> healthy    2    treat
>>>>> healthy    2    treat
>>>>> healthy    3    treat
>>>>> healthy    3    treat
>>>>> 
>>>>> Thanks!
>>>>> Maria
>>>>> 
>>>>> 
>>>>> On 25/10/2012 06:18, Gordon K Smyth wrote:
>>>>>> Dear Anna,
>>>>>> 
>>>>>> You are right to recognise that the analysis of this sort of design is
>>>>>> more complex than many other experiments, because it includes
>>>>>> comparisons both within and between patients.  I have included a new
>>>>>> section in the edgeR User's Guide based on your experiment that
>>>>>> describes the analysis. This will appear in the official release of
>>>>>> edgeR in a couple of days. In the meantime, see pages 31-33 of:
>>>>>> 
>>>>>> http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf
>>>>>> 
>>>>>> Best wishes
>>>>>> Gordon
>>>>>> 
>>>>>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT)
>>>>>>> From: "anna [guest]" <guest at bioconductor.org>
>>>>>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr
>>>>>>> Subject: [BioC] EdgeR: paired samples together with independant
>>>>>>>     samples
>>>>>>> 
>>>>>>> 
>>>>>>> Hello,
>>>>>>> I am using EdgeR to analyse my RNAseq data.
>>>>>>> 
>>>>>>> I have:
>>>>>>> 
>>>>>>> cells from 3 healthy patients , either treated or not with a hormone .
>>>>>>> 
>>>>>>> cells from 3 patients with disease D1, either treated or not with the
>>>>>>> hormone
>>>>>>> 
>>>>>>> cells from 3 patients with disease D2, either treated or not with the
>>>>>>> hormone.
>>>>>>> 
>>>>>>> I would like to know what is wrong in the response to the hormone in
>>>>>>> patients with disease D1 and D2.
>>>>>>> 
>>>>>>> I don't know how to combine paired comparisons, with pairwise
>>>>>>> comparisons, in a unique glm analysis.
>>>>>>> 
>>>>>>> thank you very much,
>>>>>>> anna
>>>>>>> 
>>>>>>> -- output of sessionInfo():
>>>>>>> 
>>>>>>> R version 2.15.1 (2012-06-22)
>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>> 
>>>>>>> locale:
>>>>>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
>>>>>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
>>>>>>> [5] LC_TIME=French_France.1252
>>>>>>> 
>>>>>>> attached base packages:
>>>>>>> [1] stats     graphics  grDevices utils     datasets methods base
>>>>>>> 
>>>>>>> loaded via a namespace (and not attached):
>>>>>>> [1] tools_2.15.1
>>>>>>> 
>> 
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the 
>> addressee.
>> You must not disclose, forward, print or use it without the permission of 
>> the sender.
>> ______________________________________________________________________
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list