[BioC] EdgeR: paired samples together with independant samples

Mon Nov 12 16:40:30 CET 2012

Dear Gordon,

I have another question about this analysis. Previously I performed an 
analysis on the same data but without incorporating effects of patient. 
My design matrix had columns: "Disease1.Treat", "Disease1.Control", 
"Healthy.Treat", "Healthy.Control", and I then tested for genes showing 
a significant interaction between disease and treatment using the 
contrast ((Disease1.Treat - Disease1.Control) - (Healthy.Treat - 
Healthy.Control)). I think this is what is explained on pages 25-26 of 
the edgeR users guide (Oct 27 2012 version).

Now I want to take into account patient effects as well, so I have my 
design matrix with columns:
[1] "(Intercept)"                  "DiseaseDisease1"
[3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2"
[5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3"
[7] "DiseaseDisease1:Patient4" "DiseaseHealthy:TreatmentTreat"
[9] "DiseaseDisease1:TreatmentTreat"

Reading the explanation on pages 32-33 of the users guide, to do the 
equivalent contrast to find genes showing significant interaction 
between disease and treatment, should I simply use:
lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,-1,1))  ?

I think this is what the guide is saying, but I just want to make sure...

Thanks and best wishes,
Maria

On 07/11/2012 22:55, Gordon K Smyth wrote:
> Dear Maria,
>
> Sounds ok from what you say not to collapse libraries.  However, if 
> the three treated cultures and three untreated cultures for one 
> patient are truly three pairs, then this pairing should be reflected 
> in the analysis. You can handle this by numbering the samples by 
> paired culture from 1 to 7 instead of numbering by patient.
>
> An MDS plot could guide you in judging whether there are baseline 
> differences between the different pairs for one patient, and hence 
> whether your pairing should be by culture instead of by patient.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
>
> On Wed, 7 Nov 2012, Maria Keays wrote:
>
>> Dear Gordon,
>>
>> Thanks very much for the helpful advice. I'm treating them as 
>> biological replicates -- they are cell cultures and it's just that I 
>> have multiple separately treated/untreated pairs of cultures from 
>> some patients and only one treated/untreated pair for others. So 
>> although some cultures came from the same patient, they were all 
>> treated separately and then RNA was extracted from each culture. 
>> Would you say that's the right thing to do?
>>
>> Thanks and best wishes,
>> Maria
>>
>>
>> On 07/11/2012 00:01, Gordon K Smyth wrote:
>>> Dear Maria,
>>>
>>> Thanks for the specific reference to the documentation that you've 
>>> followed.
>>>
>>> Yes, you are correct, the error is arising because there is no 4th 
>>> patient in the healthy group.  If you have a look at your design 
>>> matrix, you will see that there is a column called 
>>> DiseaseHealthy:Patient4 that consists entirely of zeros.  It should 
>>> be column 8, but check:
>>>
>>>    design[,8]
>>>
>>> The easiest way to proceed is simply to remove that column manually 
>>> from the design matrix:
>>>
>>>    design2 <- design[,-8]
>>>
>>> Your experiment has another issue, in that you have repeat samples 
>>> on several of the patients.  Are these biological replicates?  If 
>>> not, if they are just technical replicates, then they should be 
>>> collapsed into one library before analysis.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>> Date: Tue, 06 Nov 2012 09:19:08 +0000
>>>> From: Maria Keays <mkeays at ebi.ac.uk>
>>>> To: bioconductor at r-project.org
>>>> Subject: Re: [BioC] EdgeR: paired samples together with independant
>>>>     samples
>>>>
>>>> Hello,
>>>>
>>>> I read this thread and related user guide material with interest 
>>>> because
>>>> I am working with a very similar data set with paired samples. 
>>>> However,
>>>> I'm having trouble which I think stems from my data being 
>>>> unbalanced? I
>>>> have four patients with a disease and three without, and within 
>>>> that for
>>>> some patients I have replicates but for others I do not. I've 
>>>> created a
>>>> design matrix as described on p32 of the 27 October 2012 edgeR user's
>>>> guide, but when I try to estimate the common dispersion using
>>>> estimateGLMCommonDisp() it tells me:
>>>>
>>>> "Error in glmFit.default(y, design = design, dispersion = dispersion,
>>>> offset = offset) :
>>>>   Design matrix not of full rank.  The following coefficients not
>>>> estimable:
>>>>  DiseaseHealthy:Patient4"
>>>>
>>>> I guess because I have 4 patients in the diseased set and only 3 in 
>>>> the
>>>> healthy set? If I remove Patient4 and try again, I'm able to continue
>>>> the analysis successfully, but I'd obviously like to be able to 
>>>> include
>>>> all the data -- is that possible? If so, could you explain how to 
>>>> do it?
>>>>
>>>> The original annotations for my data are below:
>>>>
>>>> Disease    Patient    Treatment
>>>> disease1    1    control
>>>> disease1    1    control
>>>> disease1    1    control
>>>> disease1    2    control
>>>> disease1    3    control
>>>> disease1    3    control
>>>> disease1    4    control
>>>> disease1    1    treat
>>>> disease1    1    treat
>>>> disease1    1    treat
>>>> disease1    2    treat
>>>> disease1    3    treat
>>>> disease1    3    treat
>>>> disease1    4    treat
>>>> healthy    5    control
>>>> healthy    6    control
>>>> healthy    6    control
>>>> healthy    6    control
>>>> healthy    7    control
>>>> healthy    7    control
>>>> healthy    5    treat
>>>> healthy    6    treat
>>>> healthy    6    treat
>>>> healthy    6    treat
>>>> healthy    7    treat
>>>> healthy    7    treat
>>>>
>>>> As I was following the user's guide I amended the "Patient" labels 
>>>> so it
>>>> looked like this when I created the design matrix:
>>>>
>>>> Disease    Patient    Treatment
>>>> disease1    1    control
>>>> disease1    1    control
>>>> disease1    1    control
>>>> disease1    2    control
>>>> disease1    3    control
>>>> disease1    3    control
>>>> disease1    4    control
>>>> disease1    1    treat
>>>> disease1    1    treat
>>>> disease1    1    treat
>>>> disease1    2    treat
>>>> disease1    3    treat
>>>> disease1    3    treat
>>>> disease1    4    treat
>>>> healthy    1    control
>>>> healthy    2    control
>>>> healthy    2    control
>>>> healthy    2    control
>>>> healthy    3    control
>>>> healthy    3    control
>>>> healthy    1    treat
>>>> healthy    2    treat
>>>> healthy    2    treat
>>>> healthy    2    treat
>>>> healthy    3    treat
>>>> healthy    3    treat
>>>>
>>>> Thanks!
>>>> Maria
>>>>
>>>>
>>>> On 25/10/2012 06:18, Gordon K Smyth wrote:
>>>>> Dear Anna,
>>>>>
>>>>> You are right to recognise that the analysis of this sort of 
>>>>> design is
>>>>> more complex than many other experiments, because it includes
>>>>> comparisons both within and between patients.  I have included a new
>>>>> section in the edgeR User's Guide based on your experiment that
>>>>> describes the analysis. This will appear in the official release of
>>>>> edgeR in a couple of days. In the meantime, see pages 31-33 of:
>>>>>
>>>>> http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT)
>>>>>> From: "anna [guest]" <guest at bioconductor.org>
>>>>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr
>>>>>> Subject: [BioC] EdgeR: paired samples together with independant
>>>>>>     samples
>>>>>>
>>>>>>
>>>>>> Hello,
>>>>>> I am using EdgeR to analyse my RNAseq data.
>>>>>>
>>>>>> I have:
>>>>>>
>>>>>> cells from 3 healthy patients , either treated or not with a 
>>>>>> hormone .
>>>>>>
>>>>>> cells from 3 patients with disease D1, either treated or not with 
>>>>>> the
>>>>>> hormone
>>>>>>
>>>>>> cells from 3 patients with disease D2, either treated or not with 
>>>>>> the
>>>>>> hormone.
>>>>>>
>>>>>> I would like to know what is wrong in the response to the hormone in
>>>>>> patients with disease D1 and D2.
>>>>>>
>>>>>> I don't know how to combine paired comparisons, with pairwise
>>>>>> comparisons, in a unique glm analysis.
>>>>>>
>>>>>> thank you very much,
>>>>>> anna
>>>>>>
>>>>>> -- output of sessionInfo():
>>>>>>
>>>>>> R version 2.15.1 (2012-06-22)
>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
>>>>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
>>>>>> [5] LC_TIME=French_France.1252
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets methods base
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] tools_2.15.1
>>>>>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}