[BioC] Repeated Measures mRNA expression analysis

Tue Jul 2 10:31:58 CEST 2013

Hi Charles,

Yes, you're on the right track now, but this is not a simple design and it 
requires care.  As James says, it depends on what assumptions you want to 
make.  I would add that it also depends on what questions you want to 
answer.  In my previous two posts, I tried to prompt you to state what 
questions you want to answer, but you haven't taken the bait yet.  A 
statistical analysis is always designed to test certain scientific 
questions -- there isn't a "correct" analysis for a given design 
independent of what your hypotheses are.

Have you looked at Section 3.5 "Comparisons Both Between and Within 
Subjects" in the edgeR User's Guide?  The design discussed in this section 
is the same as your experiment, except that you have 3 repeated measures 
per subject instead of 2.

The analysis given in the edgeR user's guide allows you to find genes that 
are different over time for (i) treated subjects and (ii) control 
subjects, and it allows you to find genes that respond differently to time 
in the treated vs control subject.

However it does not allow you to test for a baseline difference between 
treated and control subjects at time 0.  If you need to do this, then a 
quite different analysis is needed (discussed in Section 9.7 "Multi-level 
Experiments" of the limma User's Guide).

Best wishes
Gordon

On Mon, 1 Jul 2013, James W. MacDonald wrote:

> Hi Charles,
>
> On 7/1/2013 9:07 AM, Charles Determan Jr wrote:

>> I apologize for a second post but I want to bring this questing back up 
>> as I still cannot find a definitive answer on my own.  In brief, I am 
>> wondering about the design matrix when testing for differential 
>> expression between two groups within which each sample has been 
>> measured at consecutive timepoints (repeated measures).  Therefore, if 
>> my interpretations are correct, I need a two-way analysis that 
>> recognizes dependence between consecutive measurements.  I am familiar 
>> with limma, edgeR and DESeq but am uncertain how to design an 
>> appropriate design matrix for these comparisons.  The best I can guess 
>> is that I add a 'Subject' factor to the design matrix corresponding to 
>> each unique sample to correct for dependence, is this correct?
>
> It depends on how sophisticated you want to get, or alternatively what 
> assumptions you are willing to make.
>
> The simplest thing to do would be to block on subject (see the blocking 
> portion of the limma User's guide, starting on p. 42). This makes very 
> simple assumptions about the data, namely that the differences between 
> subjects can be accounted for by the mean of each subject.
>
> Best,
>
> Jim
>
>> 
>> My sincere regards,
>> Charles
>> 
>> 
>> On Wed, Jun 26, 2013 at 11:54 AM, Charles Determan 
>> Jr<deter088 at umn.edu>wrote:
>> 
>>> To help clarify further here is a dataframe of the design.
>>>
>>>     subject  group times
>>> 1        1 Treated    0hr
>>> 2        2 Treated    0hr
>>> 3        3 Control    0hr
>>> 4        4 Treated    0hr
>>> 5        5 Control    0hr
>>> 6        6 Control    0hr
>>> 7        1 Treated    1hr
>>> 8        2 Treated    1hr
>>> 9        3 Control    1hr
>>> 
>>> ...
>>> 
>>> 17       5 Control    2hr
>>> 
>>> 18 6 Control 2hr
>>> 
>>> My thought process has been as follows:
>>> 
>>> In the edgeR userguide there is the treatment combination example
>>> 
>>>> targets
>>> Sample Treat Time
>>> 1 Sample1 Placebo 0h
>>> 2 Sample2 Placebo 0h
>>> 3 Sample3 Placebo 1h
>>> 4 Sample4 Placebo 1h
>>> 5 Sample5 Placebo 2h
>>> 
>>> 6 Sample6 Placebo 2h
>>> 7 Sample1 Drug 0h
>>> 8 Sample2 Drug 0h
>>> 9 Sample3 Drug 1h
>>> 10 Sample4 Drug 1h
>>> 11 Sample5 Drug 2h
>>> 12 Sample6 Drug 2h
>>> 
>>> which combines the groups to produce a single group (ex. Drug.1, 
>>> Placebo.1, Drug.2, etc)
>>> 
>>> This seems potentially appropriate but this appears to assume independence 
>>> between samples whereas my data consists of what you could call 'true 
>>> repeated measures' on the same sample.  This seems to draw on the paired 
>>> samples and blocked examples.  These proceed by having the 'subject' as a 
>>> factor as well, for example:
>>> 
>>> design<- model.matrix(~Subject+Treatment)
>>> 
>>> This leads me to guess that a combination of these techniques is required. 
>>> Perhaps merging the times and group factors in my dataset (see above) as 
>>> 'newgroup' (e.g. Control.0, Control.1, Treatment.0, etc).  Then create the 
>>> model formula:
>>> 
>>> design<- model.matrix(~Subject+newgroup)
>>> 
>>> Does this seem appropriate or am I way off base and over thinking this? 
>>> Thanks for any suggestions.
>>> 
>>> Regards,
>>> Charles
>>> 
>>> 
>>> 
>>> On Tue, Jun 25, 2013 at 11:11 PM, Gordon K Smyth<smyth at wehi.edu.au>wrote:
>>> 
>>>> Charles,
>>>> 
>>>> Are there only 2 biological units in your experiment?  (One for treatment
>>>> and one for control?)  Or do you have multiple biological units in each
>>>> group?  Surely it must be the latter but, if so, your model does not take
>>>> this into account.
>>>> 
>>>> What questions do you want to test?
>>>> 
>>>> Best
>>>> Gordon
>>>> 
>>>> 
>>>> 
>>>> On Tue, 25 Jun 2013, Charles Determan Jr wrote:
>>>>
>>>>   Gordon,
>>>>> I apologize for not being more definitive with my description. 
>>>>> Your initial definition is my intention, consecutive measurements on 
>>>>> the same biological units.  I will look over the comments in the 
>>>>> link you provided. Thank you for your insight, I appreciate any 
>>>>> further thoughts you may have.
>>>>> 
>>>>> Regards,
>>>>> Charles
>>>>> 
>>>>> 
>>>>> On Tue, Jun 25, 2013 at 6:57 PM, Gordon K Smyth<smyth at wehi.edu.au>
>>>>> wrote:
>>>>>
>>>>>   Dear Charles,
>>>>>> The term "repeated measures" describes a situation in which repeated
>>>>>> measurements are made on the same biological unit.  Hence the repeated
>>>>>> measurements are correlated.  It is not clear from the brief 
>>>>>> information
>>>>>> you give whether this is the case, or whether the different time points
>>>>>> derive from independent biological samples.
>>>>>> 
>>>>>> The model you give might or might not be correct, depending on the
>>>>>> experimental units and the hypotheses that you plan to test.  For most
>>>>>> experiments it is not the right approach, for reasons that I have
>>>>>> pointed
>>>>>> out elsewhere:
>>>>>> 
>>>>>> https://www.stat.math.ethz.ch/****pipermail/bioconductor/2013-****<https://www.stat.math.ethz.ch/**pipermail/bioconductor/2013-**>
>>>>>> June/053297.html<https://www.**stat.math.ethz.ch/pipermail/**
>>>>>> bioconductor/2013-June/053297.**html<https://www.stat.math.ethz.ch/pipermail/bioconductor/2013-June/053297.html>
>>>>>> 
>>>>>> Best wishes
>>>>>> Gordon
>>>>>> 
>>>>>>
>>>>>>   Date: Mon, 24 Jun 2013 15:08:48 -0500
>>>>>> 
>>>>>>> From: Charles Determan Jr<deter088 at umn.edu>
>>>>>>> To: bioconductor at r-project.org
>>>>>>> Subject: [BioC] Repeated Measures mRNA expression analysis
>>>>>>> 
>>>>>>> Greetings,
>>>>>>> 
>>>>>>> I need to analyze data collected from an RNA-seq experiment. 
>>>>>>> This consists of comparing two groups (control vs. treatment) and 
>>>>>>> repeated sampling (1 hour, 2 hours, 3 hours).  If this were a 
>>>>>>> univariate problem I know I would use a 2-way rmANOVA analysis but 
>>>>>>> this is RNA-seq and I have thousands of variables.  I am very 
>>>>>>> familiar with multiple packages for RNA differential expression 
>>>>>>> analysis (e.g. DESeq2, edgeR, limma, etc.) but I have been unable 
>>>>>>> to figure out what the most appropriate way to analyze such data 
>>>>>>> in this circumstance. The closest answer I can find within the 
>>>>>>> DESeq2 and edgeR manuals (limma is somewhat confusing to me) is to 
>>>>>>> place to main treatment of interest at the end of the design 
>>>>>>> formula, for example:
>>>>>>> 
>>>>>>> design(dds)<- formula(~ time + treatment)
>>>>>>> 
>>>>>>> Is this what is considered the appropriate way to address repeated 
>>>>>>> measures in mRNA expression experiments?  Any thoughts are 
>>>>>>> appreciated.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> --
>>>>>>> Charles Determan
>>>>>>> Integrated Biosciences PhD Candidate
>>>>>>> University of Minnesota
>>>>>>> 
>>>>>>> 
>>>>> --
>>>>> Charles Determan
>>>>> Integrated Biosciences PhD Candidate
>>>>> University of Minnesota
>>>>> 
>>> 
>> 
>> 
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}