[BioC] EdgeR: paired samples together with independant samples
Gordon K Smyth
smyth at wehi.EDU.AU
Tue Nov 13 00:39:06 CET 2012
On Mon, 12 Nov 2012, Maria Keays wrote:
> Dear Gordon,
>
> I have another question about this analysis. Previously I performed an
> analysis on the same data but without incorporating effects of patient. My
> design matrix had columns: "Disease1.Treat", "Disease1.Control",
> "Healthy.Treat", "Healthy.Control", and I then tested for genes showing a
> significant interaction between disease and treatment using the contrast
> ((Disease1.Treat - Disease1.Control) - (Healthy.Treat - Healthy.Control)). I
> think this is what is explained on pages 25-26 of the edgeR users guide (Oct
> 27 2012 version).
>
> Now I want to take into account patient effects as well, so I have my design
> matrix with columns:
> [1] "(Intercept)" "DiseaseDisease1"
> [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2"
> [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3"
> [7] "DiseaseDisease1:Patient4" "DiseaseHealthy:TreatmentTreat"
> [9] "DiseaseDisease1:TreatmentTreat"
>
> Reading the explanation on pages 32-33 of the users guide, to do the
> equivalent contrast to find genes showing significant interaction between
> disease and treatment, should I simply use:
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,-1,1)) ?
Yes.
Gordon
> I think this is what the guide is saying, but I just want to make sure...
>
> Thanks and best wishes,
> Maria
>
>
> On 07/11/2012 22:55, Gordon K Smyth wrote:
>> Dear Maria,
>>
>> Sounds ok from what you say not to collapse libraries. However, if the
>> three treated cultures and three untreated cultures for one patient are
>> truly three pairs, then this pairing should be reflected in the analysis.
>> You can handle this by numbering the samples by paired culture from 1 to 7
>> instead of numbering by patient.
>>
>> An MDS plot could guide you in judging whether there are baseline
>> differences between the different pairs for one patient, and hence whether
>> your pairing should be by culture instead of by patient.
>>
>> Best wishes
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> http://www.statsci.org/smyth
>>
>> On Wed, 7 Nov 2012, Maria Keays wrote:
>>
>>> Dear Gordon,
>>>
>>> Thanks very much for the helpful advice. I'm treating them as biological
>>> replicates -- they are cell cultures and it's just that I have multiple
>>> separately treated/untreated pairs of cultures from some patients and only
>>> one treated/untreated pair for others. So although some cultures came from
>>> the same patient, they were all treated separately and then RNA was
>>> extracted from each culture. Would you say that's the right thing to do?
>>>
>>> Thanks and best wishes,
>>> Maria
>>>
>>>
>>> On 07/11/2012 00:01, Gordon K Smyth wrote:
>>>> Dear Maria,
>>>>
>>>> Thanks for the specific reference to the documentation that you've
>>>> followed.
>>>>
>>>> Yes, you are correct, the error is arising because there is no 4th
>>>> patient in the healthy group. If you have a look at your design matrix,
>>>> you will see that there is a column called DiseaseHealthy:Patient4 that
>>>> consists entirely of zeros. It should be column 8, but check:
>>>>
>>>> design[,8]
>>>>
>>>> The easiest way to proceed is simply to remove that column manually from
>>>> the design matrix:
>>>>
>>>> design2 <- design[,-8]
>>>>
>>>> Your experiment has another issue, in that you have repeat samples on
>>>> several of the patients. Are these biological replicates? If not, if
>>>> they are just technical replicates, then they should be collapsed into
>>>> one library before analysis.
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>> Date: Tue, 06 Nov 2012 09:19:08 +0000
>>>>> From: Maria Keays <mkeays at ebi.ac.uk>
>>>>> To: bioconductor at r-project.org
>>>>> Subject: Re: [BioC] EdgeR: paired samples together with independant
>>>>> samples
>>>>>
>>>>> Hello,
>>>>>
>>>>> I read this thread and related user guide material with interest because
>>>>> I am working with a very similar data set with paired samples. However,
>>>>> I'm having trouble which I think stems from my data being unbalanced? I
>>>>> have four patients with a disease and three without, and within that for
>>>>> some patients I have replicates but for others I do not. I've created a
>>>>> design matrix as described on p32 of the 27 October 2012 edgeR user's
>>>>> guide, but when I try to estimate the common dispersion using
>>>>> estimateGLMCommonDisp() it tells me:
>>>>>
>>>>> "Error in glmFit.default(y, design = design, dispersion = dispersion,
>>>>> offset = offset) :
>>>>> Design matrix not of full rank. The following coefficients not
>>>>> estimable:
>>>>> DiseaseHealthy:Patient4"
>>>>>
>>>>> I guess because I have 4 patients in the diseased set and only 3 in the
>>>>> healthy set? If I remove Patient4 and try again, I'm able to continue
>>>>> the analysis successfully, but I'd obviously like to be able to include
>>>>> all the data -- is that possible? If so, could you explain how to do it?
>>>>>
>>>>> The original annotations for my data are below:
>>>>>
>>>>> Disease Patient Treatment
>>>>> disease1 1 control
>>>>> disease1 1 control
>>>>> disease1 1 control
>>>>> disease1 2 control
>>>>> disease1 3 control
>>>>> disease1 3 control
>>>>> disease1 4 control
>>>>> disease1 1 treat
>>>>> disease1 1 treat
>>>>> disease1 1 treat
>>>>> disease1 2 treat
>>>>> disease1 3 treat
>>>>> disease1 3 treat
>>>>> disease1 4 treat
>>>>> healthy 5 control
>>>>> healthy 6 control
>>>>> healthy 6 control
>>>>> healthy 6 control
>>>>> healthy 7 control
>>>>> healthy 7 control
>>>>> healthy 5 treat
>>>>> healthy 6 treat
>>>>> healthy 6 treat
>>>>> healthy 6 treat
>>>>> healthy 7 treat
>>>>> healthy 7 treat
>>>>>
>>>>> As I was following the user's guide I amended the "Patient" labels so it
>>>>> looked like this when I created the design matrix:
>>>>>
>>>>> Disease Patient Treatment
>>>>> disease1 1 control
>>>>> disease1 1 control
>>>>> disease1 1 control
>>>>> disease1 2 control
>>>>> disease1 3 control
>>>>> disease1 3 control
>>>>> disease1 4 control
>>>>> disease1 1 treat
>>>>> disease1 1 treat
>>>>> disease1 1 treat
>>>>> disease1 2 treat
>>>>> disease1 3 treat
>>>>> disease1 3 treat
>>>>> disease1 4 treat
>>>>> healthy 1 control
>>>>> healthy 2 control
>>>>> healthy 2 control
>>>>> healthy 2 control
>>>>> healthy 3 control
>>>>> healthy 3 control
>>>>> healthy 1 treat
>>>>> healthy 2 treat
>>>>> healthy 2 treat
>>>>> healthy 2 treat
>>>>> healthy 3 treat
>>>>> healthy 3 treat
>>>>>
>>>>> Thanks!
>>>>> Maria
>>>>>
>>>>>
>>>>> On 25/10/2012 06:18, Gordon K Smyth wrote:
>>>>>> Dear Anna,
>>>>>>
>>>>>> You are right to recognise that the analysis of this sort of design is
>>>>>> more complex than many other experiments, because it includes
>>>>>> comparisons both within and between patients. I have included a new
>>>>>> section in the edgeR User's Guide based on your experiment that
>>>>>> describes the analysis. This will appear in the official release of
>>>>>> edgeR in a couple of days. In the meantime, see pages 31-33 of:
>>>>>>
>>>>>> http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf
>>>>>>
>>>>>> Best wishes
>>>>>> Gordon
>>>>>>
>>>>>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT)
>>>>>>> From: "anna [guest]" <guest at bioconductor.org>
>>>>>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr
>>>>>>> Subject: [BioC] EdgeR: paired samples together with independant
>>>>>>> samples
>>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>> I am using EdgeR to analyse my RNAseq data.
>>>>>>>
>>>>>>> I have:
>>>>>>>
>>>>>>> cells from 3 healthy patients , either treated or not with a hormone .
>>>>>>>
>>>>>>> cells from 3 patients with disease D1, either treated or not with the
>>>>>>> hormone
>>>>>>>
>>>>>>> cells from 3 patients with disease D2, either treated or not with the
>>>>>>> hormone.
>>>>>>>
>>>>>>> I would like to know what is wrong in the response to the hormone in
>>>>>>> patients with disease D1 and D2.
>>>>>>>
>>>>>>> I don't know how to combine paired comparisons, with pairwise
>>>>>>> comparisons, in a unique glm analysis.
>>>>>>>
>>>>>>> thank you very much,
>>>>>>> anna
>>>>>>>
>>>>>>> -- output of sessionInfo():
>>>>>>>
>>>>>>> R version 2.15.1 (2012-06-22)
>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>>
>>>>>>> locale:
>>>>>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
>>>>>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
>>>>>>> [5] LC_TIME=French_France.1252
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>>
>>>>>>> loaded via a namespace (and not attached):
>>>>>>> [1] tools_2.15.1
>>>>>>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list