[R-sig-ME] mixed-effects model specification question

Wed May 7 10:45:19 CEST 2008

Hi Mark,

On Wed, May 07, 2008 at 01:57:41AM -0400, Mark Kimpel wrote:
>    Andrew, our emails must have crossed on the net. You have come to
>    essentially the same conclusion as I, that Tissue/Region should be
>    discarded as an effect and the 3 samples per rat treated equally. What
>    you seem to be suggesting, however, is that averaging these 3 samples
>    to come up with one summary measure per animal, and thus performing a
>    t-test on 2 groups of 6 animals, would be the same as using all 36
>    measurements with rat as a random effect. I'm not a statistician, but
>    it seems to me that those 3 measurements within each rat give some
>    information about the reliability of those measurements and thus would
>    contribute useful information to the model. If, for example, the
>    variance within animal was much greater than the variance between
>    animals, then even if the means between the groups were different, I
>    would have less confidence that the population means were different
>    than I would if the measurements were tightly grouped within animal.
>    (I'm ignoring, for simplification, the within-group between-animal
>    variance).

yes, I see your reasoning.  I suspect that the within-rat variance
would have to be quite large to make an effect, but as you say, you
can try it both ways.  

Cheers

Andrew

>    I guess the only way to know for sure would be to do it both ways and
>    see if the conclusions were any different. Since I will be repeating
>    this over 31,000 genes, any difference between the approaches should be
>    apparent.
>    Mark
> 
>    On Wed, May 7, 2008 at 1:35 AM, Andrew Robinson
>    <[1]A.Robinson at ms.unimelb.edu.au> wrote:
> 
>    On Tue, May 06, 2008 at 08:56:51PM -0400, Mark Kimpel wrote:
>    >    The "treatments" are actually two selectively bred lines of rats,
>    with
>    >    6 animals from each line. We need p-values as a filter to
>    determine
>    >    which of ~31,000 genes on the array are significant. The p-values
>    are
>    >    subjected to FDR correction, of course. I've never seen a
>    micro-array
>    >    paper published without p-values.
>    >    As for Andrew's comments:
>    >    1. yes, there are three measurments taken per rat, but they are
>    from
>    >    different brain regions. Since each brain region was processed
>    >    separately, however, the region and batch effects cannot be
>    separated.
>    >    Having said that, I still label this factor as "region" in my
>    model.
> 
>      I'm a bit confused still - sorry - do you therefore have three
>      batches
>      per rat, for a total of 36 batches (with two missing possibly)?  If
>      so
>      then I still think that your model is over-parametrized, and that
>      you
>      can eliminate the Tissue random effect, based on your model output
>      below (thanks for including it).
> 
>    >    2. In the past I have done exactly as you suggested and averaged
>    brain
>    >    regions, but I was hoping nlme would allow for more
>    sophistication.
>    >    Because certain genes vary greatly across brain regions, averaging
>    >    would seem to discard some useful information which could be used
>    to
>    >    improve the inference made on treatment effect.
> 
>      Can you specify exactly what useful information is being used in the
>      analysis that would be discarded if you took averages?
> 
>    >    If I used averaging, I would just have 12 measurements made on 12
>    >    rats and could do a simple t-test, but I don't think that would
>    >    be preferable, would it?
> 
>      It's not clear to me that the conclusions will be different.  If
>      not,
>      then the simpler analysis seems to be the better one.  I would
>      suggest
>      that you compare them and see what happens.
>      Cheers,
>      Andrew
> 
>    >    Here is the output of my model:
>    >         myModel <- lme(fixed = geneExpression ~  Treatment,  random =
>    ~1 |
>    >    Rat/Tissue)
>    >    > myModel
>    >    Linear mixed-effects model fit by REML
>    >      Data: NULL
>    >      Log-restricted-likelihood: 40.60593
>    >      Fixed: geneExpression ~ Treatment
>    >     (Intercept) TreatmentLAD
>    >    12.628465893 -0.009280945
>    >    Random effects:
>    >     Formula: ~1 | Rat
>    >             (Intercept)
>    >    StdDev: 2.752519e-06
>    >     Formula: ~1 | Tissue %in% Rat
>    >            (Intercept)     Residual
>    >    StdDev:  0.06226097 0.0002623086
>    >    Number of Observations: 34
>    >    Number of Groups:
>    >                Rat Tissue %in% Rat
>    >                 12              34
>    >    > summary(myModel)
>    >    Linear mixed-effects model fit by REML
>    >     Data: NULL
>    >            AIC       BIC   logLik
>    >      -71.21185 -63.88317 40.60593
>    >    Random effects:
>    >     Formula: ~1 | Rat
>    >             (Intercept)
>    >    StdDev: 2.752519e-06
>    >     Formula: ~1 | Tissue %in% Rat
>    >            (Intercept)     Residual
>    >    StdDev:  0.06226097 0.0002623086
>    >    Fixed effects: geneExpression ~ Treatment
>    >                     Value  Std.Error DF  t-value p-value
>    >    (Intercept)  12.628466 0.01510064 22 836.2870  0.0000
>    >    TreatmentLAD -0.009281 0.02135553 10  -0.4346  0.6731
>    >     Correlation:
>    >                 (Intr)
>    >    TreatmentLAD -0.707
>    >    Standardized Within-Group Residuals:
>    >              Min            Q1           Med            Q3
>    Max
>    >    -0.0083222376 -0.0027523180 -0.0001242287  0.0034693142
>    0.0087149017
>    >    Number of Observations: 34
>    >    Number of Groups:
>    >                Rat Tissue %in% Rat
>    >                 12              34
>    >    >
>    >
> 
>      >    On Tue, May 6, 2008 at 4:42 PM,
>      <[1][2]Antonio_Paredes at aphis.usda.gov>
> 
>    >    wrote:
>    >
>    >      How were animals allocated to treatments (at random)? What about
>    the
>    >      housing of the animals. Both of these factors could be
>    influential
>    >      in selecting the unit of study.
>    >      You also need to get an ideal at to why a p-value is enough to
>    >      support the objective of the study. Why not estimation of
>    treatment
>    >      effects?
>    >      Tony.
>    >
> 
>      >    Andrew Robinson <[2][3]A.Robinson at ms.unimelb.edu.au>
>      >    Sent by: [3][4]r-sig-mixed-models-bounces at r-project.org
> 
>    >
>    >    05/06/2008 03:19 PM
>    >
>    >
>      To
>    >
> 
>      >    Mark Kimpel <[4][5]mwkimpel at gmail.com>
>      >
>      >
>            cc
>      >
>      >    [5][6]r-sig-mixed-models at r-project.org
> 
>    >
>    >
>    Subject
>    >
>    >    Re: [R-sig-ME] mixed-effects model specification question
>    >
>    >    Mark,
>    >    You should talk to a local statistician about this, but I think
>    that
>    >    you can probably average across the measurements within each rat,
>    if
>    >    all you are interested in is a treatment effect.  For the analysis
>    of
>    >    treatment the relevant unit of replication should be the rat, in
>    any
>    >    case.
>    >    (Does anyone have any thoughts on why that might be a bad idea?)
>    >    Also if I understand your design, there are three batches per rat.
>     I
>    >    suspect that using Rat/Tissue would lead to an over-parametrized
>    >    model, if my interpretation is correct.  My guess is that Rat
>    should
>    >    be adequate.
>    >    Final points
>    >    1) My speculations would be better-informed if you showed us the
>    >      output of the model that you proposed - fyi :) .
>    >    2) You could try all three configurations and see if it makes any
>    >      difference to the inference of interest.  I suspect that it will
>    >      not.
>    >    I hope that this helps,
>    >    Andrew
>    >    On Tue, May 06, 2008 at 10:23:08AM -0400, Mark Kimpel wrote:
>    >    > Perhaps a bit of a newbie question, but I need to get this
>    right. I
>    >    need to
>    >    > make sure I am specifying a model correctly. Here is our
>    >    less-than-perfect
>    >    > experimental design:
>    >    >
>    >    > 36 rats divided into two treatment groups, i.e. 18 per group
>    >    >
>    >    > each rat has measurements taken on 3 brain regions, but each
>    brain
>    >    region
>    >    > was analyzed in a separate batch (there are strong batch
>    effects) so
>    >    we
>    >    > really can't compare the regions per se, but do recognize that
>    the 3
>    >    regions
>    >    > from a single rat have variance above and beyond that accounted
>    for
>    >    by
>    >    > technical factors. Since, because of the batch effect we are not
>    >    going to
>    >    > look at the effect of brain region, I "think" this should be a
>    >    considered a
>    >    > random effect.
>    >    >
>    >    > So I have rat, treatment, and region(batch) as variables. The
>    only
>    >    thing I
>    >    > am interested in getting a p-value for is treatment.
>    >    >
>    >    > Is the model below correct and can I squeek by with using nlme
>    to get
>    >    a
>    >    > p-value (notwithstanding recent threads on this list)?
>    >    >
>    >    >  myModel <- lme(fixed = geneExpression ~  Treatment,  random =
>    ~1 |
>    >    > Rat/Tissue)
>    >    >
>    >    > Thanks,
>    >    > Mark
>    >    > --
>    >    > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
>    >    > Indiana University School of Medicine
>    >    >
>    >    > 15032 Hunter Court, Westfield, IN 46074
>    >    >
>    >    > (317) 490-5129 Work, & Mobile & VoiceMail
>    >    > (317) 663-0513 Home (no voice mail please)
>    >    >
>    >    >
>    ******************************************************************
>    >    >
>    >    >                  [[alternative HTML version deleted]]
>    >    >
>    >    > _______________________________________________
> 
>      >    > [6][7]R-sig-mixed-models at r-project.org mailing list
>      >    >
>      [7][8]https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
>    >    --
>    >    Andrew Robinson
>    >    Department of Mathematics and Statistics            Tel:
>    >    +61-3-8344-6410
>    >    University of Melbourne, VIC 3010 Australia         Fax:
>    >    +61-3-8344-4599
> 
>      >    [8][9]http://www.ms.unimelb.edu.au/~andrewpr
>      >    [9][10]http://blogs.mbs.edu/fishing-in-the-bay/
>      >    _______________________________________________
>      >    [10][11]R-sig-mixed-models at r-project.org mailing list
>      >
>      [11][12]https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
>    >
>    >    --
>    >    Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
>    >    Indiana University School of Medicine
>    >    15032 Hunter Court, Westfield, IN 46074
>    >    (317) 490-5129 Work, & Mobile & VoiceMail
>    >    (317) 663-0513 Home (no voice mail please)
>    >    **********************************************************
>    ********
>    >
> 
>      > References
>      >
>      >    1. mailto:[13]Antonio_Paredes at aphis.usda.gov
>      >    2. mailto:[14]A.Robinson at ms.unimelb.edu.au
>      >    3. mailto:[15]r-sig-mixed-models-bounces at r-project.org
>      >    4. mailto:[16]mwkimpel at gmail.com
>      >    5. mailto:[17]r-sig-mixed-models at r-project.org
>      >    6. mailto:[18]R-sig-mixed-models at r-project.org
>      >    7. [19]https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>      >    8. [20]http://www.ms.unimelb.edu.au/%7Eandrewpr
>      >    9. [21]http://blogs.mbs.edu/fishing-in-the-bay/
>      >   10. mailto:[22]R-sig-mixed-models at r-project.org
>      >   11. [23]https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
>    --
>    Andrew Robinson
>    Department of Mathematics and Statistics            Tel:
>    +61-3-8344-6410
>    University of Melbourne, VIC 3010 Australia         Fax:
>    +61-3-8344-4599
>    [24]http://www.ms.unimelb.edu.au/~andrewpr
>    [25]http://blogs.mbs.edu/fishing-in-the-bay/
> 
>    --
>    Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
>    Indiana University School of Medicine
>    15032 Hunter Court, Westfield, IN 46074
>    (317) 490-5129 Work, & Mobile & VoiceMail
>    (317) 663-0513 Home (no voice mail please)
>    ******************************************************************
> 
> References
> 
>    1. mailto:A.Robinson at ms.unimelb.edu.au
>    2. mailto:Antonio_Paredes at aphis.usda.gov
>    3. mailto:A.Robinson at ms.unimelb.edu.au
>    4. mailto:r-sig-mixed-models-bounces at r-project.org
>    5. mailto:mwkimpel at gmail.com
>    6. mailto:r-sig-mixed-models at r-project.org
>    7. mailto:R-sig-mixed-models at r-project.org
>    8. https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>    9. http://www.ms.unimelb.edu.au/%7Eandrewpr
>   10. http://blogs.mbs.edu/fishing-in-the-bay/
>   11. mailto:R-sig-mixed-models at r-project.org
>   12. https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>   13. mailto:Antonio_Paredes at aphis.usda.gov
>   14. mailto:A.Robinson at ms.unimelb.edu.au
>   15. mailto:r-sig-mixed-models-bounces at r-project.org
>   16. mailto:mwkimpel at gmail.com
>   17. mailto:r-sig-mixed-models at r-project.org
>   18. mailto:R-sig-mixed-models at r-project.org
>   19. https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>   20. http://www.ms.unimelb.edu.au/%7Eandrewpr
>   21. http://blogs.mbs.edu/fishing-in-the-bay/
>   22. mailto:R-sig-mixed-models at r-project.org
>   23. https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>   24. http://www.ms.unimelb.edu.au/%7Eandrewpr
>   25. http://blogs.mbs.edu/fishing-in-the-bay/

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/