[BioC] EdgeR: paired samples together with independant samples
Maria Keays
mkeays at ebi.ac.uk
Mon Nov 12 16:40:30 CET 2012
Dear Gordon,
I have another question about this analysis. Previously I performed an
analysis on the same data but without incorporating effects of patient.
My design matrix had columns: "Disease1.Treat", "Disease1.Control",
"Healthy.Treat", "Healthy.Control", and I then tested for genes showing
a significant interaction between disease and treatment using the
contrast ((Disease1.Treat - Disease1.Control) - (Healthy.Treat -
Healthy.Control)). I think this is what is explained on pages 25-26 of
the edgeR users guide (Oct 27 2012 version).
Now I want to take into account patient effects as well, so I have my
design matrix with columns:
[1] "(Intercept)" "DiseaseDisease1"
[3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2"
[5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3"
[7] "DiseaseDisease1:Patient4" "DiseaseHealthy:TreatmentTreat"
[9] "DiseaseDisease1:TreatmentTreat"
Reading the explanation on pages 32-33 of the users guide, to do the
equivalent contrast to find genes showing significant interaction
between disease and treatment, should I simply use:
lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,-1,1)) ?
I think this is what the guide is saying, but I just want to make sure...
Thanks and best wishes,
Maria
On 07/11/2012 22:55, Gordon K Smyth wrote:
> Dear Maria,
>
> Sounds ok from what you say not to collapse libraries. However, if
> the three treated cultures and three untreated cultures for one
> patient are truly three pairs, then this pairing should be reflected
> in the analysis. You can handle this by numbering the samples by
> paired culture from 1 to 7 instead of numbering by patient.
>
> An MDS plot could guide you in judging whether there are baseline
> differences between the different pairs for one patient, and hence
> whether your pairing should be by culture instead of by patient.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
>
> On Wed, 7 Nov 2012, Maria Keays wrote:
>
>> Dear Gordon,
>>
>> Thanks very much for the helpful advice. I'm treating them as
>> biological replicates -- they are cell cultures and it's just that I
>> have multiple separately treated/untreated pairs of cultures from
>> some patients and only one treated/untreated pair for others. So
>> although some cultures came from the same patient, they were all
>> treated separately and then RNA was extracted from each culture.
>> Would you say that's the right thing to do?
>>
>> Thanks and best wishes,
>> Maria
>>
>>
>> On 07/11/2012 00:01, Gordon K Smyth wrote:
>>> Dear Maria,
>>>
>>> Thanks for the specific reference to the documentation that you've
>>> followed.
>>>
>>> Yes, you are correct, the error is arising because there is no 4th
>>> patient in the healthy group. If you have a look at your design
>>> matrix, you will see that there is a column called
>>> DiseaseHealthy:Patient4 that consists entirely of zeros. It should
>>> be column 8, but check:
>>>
>>> design[,8]
>>>
>>> The easiest way to proceed is simply to remove that column manually
>>> from the design matrix:
>>>
>>> design2 <- design[,-8]
>>>
>>> Your experiment has another issue, in that you have repeat samples
>>> on several of the patients. Are these biological replicates? If
>>> not, if they are just technical replicates, then they should be
>>> collapsed into one library before analysis.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>> Date: Tue, 06 Nov 2012 09:19:08 +0000
>>>> From: Maria Keays <mkeays at ebi.ac.uk>
>>>> To: bioconductor at r-project.org
>>>> Subject: Re: [BioC] EdgeR: paired samples together with independant
>>>> samples
>>>>
>>>> Hello,
>>>>
>>>> I read this thread and related user guide material with interest
>>>> because
>>>> I am working with a very similar data set with paired samples.
>>>> However,
>>>> I'm having trouble which I think stems from my data being
>>>> unbalanced? I
>>>> have four patients with a disease and three without, and within
>>>> that for
>>>> some patients I have replicates but for others I do not. I've
>>>> created a
>>>> design matrix as described on p32 of the 27 October 2012 edgeR user's
>>>> guide, but when I try to estimate the common dispersion using
>>>> estimateGLMCommonDisp() it tells me:
>>>>
>>>> "Error in glmFit.default(y, design = design, dispersion = dispersion,
>>>> offset = offset) :
>>>> Design matrix not of full rank. The following coefficients not
>>>> estimable:
>>>> DiseaseHealthy:Patient4"
>>>>
>>>> I guess because I have 4 patients in the diseased set and only 3 in
>>>> the
>>>> healthy set? If I remove Patient4 and try again, I'm able to continue
>>>> the analysis successfully, but I'd obviously like to be able to
>>>> include
>>>> all the data -- is that possible? If so, could you explain how to
>>>> do it?
>>>>
>>>> The original annotations for my data are below:
>>>>
>>>> Disease Patient Treatment
>>>> disease1 1 control
>>>> disease1 1 control
>>>> disease1 1 control
>>>> disease1 2 control
>>>> disease1 3 control
>>>> disease1 3 control
>>>> disease1 4 control
>>>> disease1 1 treat
>>>> disease1 1 treat
>>>> disease1 1 treat
>>>> disease1 2 treat
>>>> disease1 3 treat
>>>> disease1 3 treat
>>>> disease1 4 treat
>>>> healthy 5 control
>>>> healthy 6 control
>>>> healthy 6 control
>>>> healthy 6 control
>>>> healthy 7 control
>>>> healthy 7 control
>>>> healthy 5 treat
>>>> healthy 6 treat
>>>> healthy 6 treat
>>>> healthy 6 treat
>>>> healthy 7 treat
>>>> healthy 7 treat
>>>>
>>>> As I was following the user's guide I amended the "Patient" labels
>>>> so it
>>>> looked like this when I created the design matrix:
>>>>
>>>> Disease Patient Treatment
>>>> disease1 1 control
>>>> disease1 1 control
>>>> disease1 1 control
>>>> disease1 2 control
>>>> disease1 3 control
>>>> disease1 3 control
>>>> disease1 4 control
>>>> disease1 1 treat
>>>> disease1 1 treat
>>>> disease1 1 treat
>>>> disease1 2 treat
>>>> disease1 3 treat
>>>> disease1 3 treat
>>>> disease1 4 treat
>>>> healthy 1 control
>>>> healthy 2 control
>>>> healthy 2 control
>>>> healthy 2 control
>>>> healthy 3 control
>>>> healthy 3 control
>>>> healthy 1 treat
>>>> healthy 2 treat
>>>> healthy 2 treat
>>>> healthy 2 treat
>>>> healthy 3 treat
>>>> healthy 3 treat
>>>>
>>>> Thanks!
>>>> Maria
>>>>
>>>>
>>>> On 25/10/2012 06:18, Gordon K Smyth wrote:
>>>>> Dear Anna,
>>>>>
>>>>> You are right to recognise that the analysis of this sort of
>>>>> design is
>>>>> more complex than many other experiments, because it includes
>>>>> comparisons both within and between patients. I have included a new
>>>>> section in the edgeR User's Guide based on your experiment that
>>>>> describes the analysis. This will appear in the official release of
>>>>> edgeR in a couple of days. In the meantime, see pages 31-33 of:
>>>>>
>>>>> http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT)
>>>>>> From: "anna [guest]" <guest at bioconductor.org>
>>>>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr
>>>>>> Subject: [BioC] EdgeR: paired samples together with independant
>>>>>> samples
>>>>>>
>>>>>>
>>>>>> Hello,
>>>>>> I am using EdgeR to analyse my RNAseq data.
>>>>>>
>>>>>> I have:
>>>>>>
>>>>>> cells from 3 healthy patients , either treated or not with a
>>>>>> hormone .
>>>>>>
>>>>>> cells from 3 patients with disease D1, either treated or not with
>>>>>> the
>>>>>> hormone
>>>>>>
>>>>>> cells from 3 patients with disease D2, either treated or not with
>>>>>> the
>>>>>> hormone.
>>>>>>
>>>>>> I would like to know what is wrong in the response to the hormone in
>>>>>> patients with disease D1 and D2.
>>>>>>
>>>>>> I don't know how to combine paired comparisons, with pairwise
>>>>>> comparisons, in a unique glm analysis.
>>>>>>
>>>>>> thank you very much,
>>>>>> anna
>>>>>>
>>>>>> -- output of sessionInfo():
>>>>>>
>>>>>> R version 2.15.1 (2012-06-22)
>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
>>>>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
>>>>>> [5] LC_TIME=French_France.1252
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] tools_2.15.1
>>>>>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}
More information about the Bioconductor
mailing list