[BioC] Repeated Measures mRNA expression analysis
Gordon K Smyth
smyth at wehi.EDU.AU
Tue Jul 2 10:31:58 CEST 2013
Hi Charles,
Yes, you're on the right track now, but this is not a simple design and it
requires care. As James says, it depends on what assumptions you want to
make. I would add that it also depends on what questions you want to
answer. In my previous two posts, I tried to prompt you to state what
questions you want to answer, but you haven't taken the bait yet. A
statistical analysis is always designed to test certain scientific
questions -- there isn't a "correct" analysis for a given design
independent of what your hypotheses are.
Have you looked at Section 3.5 "Comparisons Both Between and Within
Subjects" in the edgeR User's Guide? The design discussed in this section
is the same as your experiment, except that you have 3 repeated measures
per subject instead of 2.
The analysis given in the edgeR user's guide allows you to find genes that
are different over time for (i) treated subjects and (ii) control
subjects, and it allows you to find genes that respond differently to time
in the treated vs control subject.
However it does not allow you to test for a baseline difference between
treated and control subjects at time 0. If you need to do this, then a
quite different analysis is needed (discussed in Section 9.7 "Multi-level
Experiments" of the limma User's Guide).
Best wishes
Gordon
On Mon, 1 Jul 2013, James W. MacDonald wrote:
> Hi Charles,
>
> On 7/1/2013 9:07 AM, Charles Determan Jr wrote:
>> I apologize for a second post but I want to bring this questing back up
>> as I still cannot find a definitive answer on my own. In brief, I am
>> wondering about the design matrix when testing for differential
>> expression between two groups within which each sample has been
>> measured at consecutive timepoints (repeated measures). Therefore, if
>> my interpretations are correct, I need a two-way analysis that
>> recognizes dependence between consecutive measurements. I am familiar
>> with limma, edgeR and DESeq but am uncertain how to design an
>> appropriate design matrix for these comparisons. The best I can guess
>> is that I add a 'Subject' factor to the design matrix corresponding to
>> each unique sample to correct for dependence, is this correct?
>
> It depends on how sophisticated you want to get, or alternatively what
> assumptions you are willing to make.
>
> The simplest thing to do would be to block on subject (see the blocking
> portion of the limma User's guide, starting on p. 42). This makes very
> simple assumptions about the data, namely that the differences between
> subjects can be accounted for by the mean of each subject.
>
> Best,
>
> Jim
>
>>
>> My sincere regards,
>> Charles
>>
>>
>> On Wed, Jun 26, 2013 at 11:54 AM, Charles Determan
>> Jr<deter088 at umn.edu>wrote:
>>
>>> To help clarify further here is a dataframe of the design.
>>>
>>> subject group times
>>> 1 1 Treated 0hr
>>> 2 2 Treated 0hr
>>> 3 3 Control 0hr
>>> 4 4 Treated 0hr
>>> 5 5 Control 0hr
>>> 6 6 Control 0hr
>>> 7 1 Treated 1hr
>>> 8 2 Treated 1hr
>>> 9 3 Control 1hr
>>>
>>> ...
>>>
>>> 17 5 Control 2hr
>>>
>>> 18 6 Control 2hr
>>>
>>> My thought process has been as follows:
>>>
>>> In the edgeR userguide there is the treatment combination example
>>>
>>>> targets
>>> Sample Treat Time
>>> 1 Sample1 Placebo 0h
>>> 2 Sample2 Placebo 0h
>>> 3 Sample3 Placebo 1h
>>> 4 Sample4 Placebo 1h
>>> 5 Sample5 Placebo 2h
>>>
>>> 6 Sample6 Placebo 2h
>>> 7 Sample1 Drug 0h
>>> 8 Sample2 Drug 0h
>>> 9 Sample3 Drug 1h
>>> 10 Sample4 Drug 1h
>>> 11 Sample5 Drug 2h
>>> 12 Sample6 Drug 2h
>>>
>>> which combines the groups to produce a single group (ex. Drug.1,
>>> Placebo.1, Drug.2, etc)
>>>
>>> This seems potentially appropriate but this appears to assume independence
>>> between samples whereas my data consists of what you could call 'true
>>> repeated measures' on the same sample. This seems to draw on the paired
>>> samples and blocked examples. These proceed by having the 'subject' as a
>>> factor as well, for example:
>>>
>>> design<- model.matrix(~Subject+Treatment)
>>>
>>> This leads me to guess that a combination of these techniques is required.
>>> Perhaps merging the times and group factors in my dataset (see above) as
>>> 'newgroup' (e.g. Control.0, Control.1, Treatment.0, etc). Then create the
>>> model formula:
>>>
>>> design<- model.matrix(~Subject+newgroup)
>>>
>>> Does this seem appropriate or am I way off base and over thinking this?
>>> Thanks for any suggestions.
>>>
>>> Regards,
>>> Charles
>>>
>>>
>>>
>>> On Tue, Jun 25, 2013 at 11:11 PM, Gordon K Smyth<smyth at wehi.edu.au>wrote:
>>>
>>>> Charles,
>>>>
>>>> Are there only 2 biological units in your experiment? (One for treatment
>>>> and one for control?) Or do you have multiple biological units in each
>>>> group? Surely it must be the latter but, if so, your model does not take
>>>> this into account.
>>>>
>>>> What questions do you want to test?
>>>>
>>>> Best
>>>> Gordon
>>>>
>>>>
>>>>
>>>> On Tue, 25 Jun 2013, Charles Determan Jr wrote:
>>>>
>>>> Gordon,
>>>>> I apologize for not being more definitive with my description.
>>>>> Your initial definition is my intention, consecutive measurements on
>>>>> the same biological units. I will look over the comments in the
>>>>> link you provided. Thank you for your insight, I appreciate any
>>>>> further thoughts you may have.
>>>>>
>>>>> Regards,
>>>>> Charles
>>>>>
>>>>>
>>>>> On Tue, Jun 25, 2013 at 6:57 PM, Gordon K Smyth<smyth at wehi.edu.au>
>>>>> wrote:
>>>>>
>>>>> Dear Charles,
>>>>>> The term "repeated measures" describes a situation in which repeated
>>>>>> measurements are made on the same biological unit. Hence the repeated
>>>>>> measurements are correlated. It is not clear from the brief
>>>>>> information
>>>>>> you give whether this is the case, or whether the different time points
>>>>>> derive from independent biological samples.
>>>>>>
>>>>>> The model you give might or might not be correct, depending on the
>>>>>> experimental units and the hypotheses that you plan to test. For most
>>>>>> experiments it is not the right approach, for reasons that I have
>>>>>> pointed
>>>>>> out elsewhere:
>>>>>>
>>>>>> https://www.stat.math.ethz.ch/****pipermail/bioconductor/2013-****<https://www.stat.math.ethz.ch/**pipermail/bioconductor/2013-**>
>>>>>> June/053297.html<https://www.**stat.math.ethz.ch/pipermail/**
>>>>>> bioconductor/2013-June/053297.**html<https://www.stat.math.ethz.ch/pipermail/bioconductor/2013-June/053297.html>
>>>>>>
>>>>>> Best wishes
>>>>>> Gordon
>>>>>>
>>>>>>
>>>>>> Date: Mon, 24 Jun 2013 15:08:48 -0500
>>>>>>
>>>>>>> From: Charles Determan Jr<deter088 at umn.edu>
>>>>>>> To: bioconductor at r-project.org
>>>>>>> Subject: [BioC] Repeated Measures mRNA expression analysis
>>>>>>>
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I need to analyze data collected from an RNA-seq experiment.
>>>>>>> This consists of comparing two groups (control vs. treatment) and
>>>>>>> repeated sampling (1 hour, 2 hours, 3 hours). If this were a
>>>>>>> univariate problem I know I would use a 2-way rmANOVA analysis but
>>>>>>> this is RNA-seq and I have thousands of variables. I am very
>>>>>>> familiar with multiple packages for RNA differential expression
>>>>>>> analysis (e.g. DESeq2, edgeR, limma, etc.) but I have been unable
>>>>>>> to figure out what the most appropriate way to analyze such data
>>>>>>> in this circumstance. The closest answer I can find within the
>>>>>>> DESeq2 and edgeR manuals (limma is somewhat confusing to me) is to
>>>>>>> place to main treatment of interest at the end of the design
>>>>>>> formula, for example:
>>>>>>>
>>>>>>> design(dds)<- formula(~ time + treatment)
>>>>>>>
>>>>>>> Is this what is considered the appropriate way to address repeated
>>>>>>> measures in mRNA expression experiments? Any thoughts are
>>>>>>> appreciated.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> --
>>>>>>> Charles Determan
>>>>>>> Integrated Biosciences PhD Candidate
>>>>>>> University of Minnesota
>>>>>>>
>>>>>>>
>>>>> --
>>>>> Charles Determan
>>>>> Integrated Biosciences PhD Candidate
>>>>> University of Minnesota
>>>>>
>>>
>>
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list