# [R-sig-ME] Data snooping in repeated measures study?

Steven McKinney smckinney at bccrc.ca
Sat Jan 8 04:17:19 CET 2011

```Were you collecting this data so that you could compare decay rates over time?
If so, then this test would be a test of an a-priori specified hypothesis.

Under the null hypothesis, all lots have the same slope.
Under the alternative hypothesis, all lots have different slopes.

So you would have one omnibus test comparing the 21 slopes for any difference.
The null model would have one parameter for slope, for all lots.
The alternative model would have 21 parameters for slope, one for each lot.

I'm not sure of the necessity for a mixed effects model - the little data
shown here suggests you could just do a linear model exercise.  You state
that the lots are independent.

The single omnibus test will yield a result that protects against multiple comparisons.
Done as a linear model exercise, the test statistic would be an F statistic with
20 df in the numerator.

If this test result is not significant, then the one apparently different-looking
lot is just a result of random variation seen across this much data.

If this test result is significant, then the next question is which lots differ
from which others?  Multiple comparison procedures can then be used to
illustrate that issue.

It would be data-snooping to pick out the one obviously different lot, and start
comparing it to the other 20.  Test statistic p-values would then be too liberal
and confidence intervals too narrow.

lmf <- lm(CHG ~ Days * Lot, data = Conc.dat) # Each lot has different slope and intercept
lmr <- lm(CHG ~ Days + Lot, data = Conc.dat) # Each lot has same slope but different intercept
anova(lmr, lmf)

should give you the omnibus F test.

Check data plots,  residual plots and other diagnostics for any pathological issues in the model fits.

Steven McKinney, Ph.D.

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3

________________________________________
From: r-sig-mixed-models-bounces at r-project.org [r-sig-mixed-models-bounces at r-project.org] On Behalf Of Prew, Paul [Paul.Prew at ecolab.com]
Sent: January 7, 2011 1:27 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Data snooping in repeated measures study?

Hello, I've been asked to perform an analysis that I'm not sure how to frame properly, or even if it can be performed validly.

There are 21 independent batches ("Lots") of a chemical that were measured repeatedly over a number of months.  The response measured was the concentration of the active ingredient "CHG", with interest in how the CHG decays over time.  One Lot had a slope much lower than the other Lots.  Is it possible to test the slope of this Lot for statistical significance, with the null hypothesis that the slope is no different than the overall slope for this chemical?  Or would any test constructed just be data snooping, invalidating any inference?  Can anyone suggest a valid approach using lme?

> str(Conc.dat)
'data.frame':   121 obs. of  3 variables:
\$ Lot : Factor w/ 21 levels "L012391","L012471",..: 16 16 16 16 16 16 16 16 10 10 ...
\$ Days: int  30 121 217 307 399 583 765 766 78 176 ...
\$ CHG.: num  2.06 2.01 1.97 1.94 1.88 …

All of the Lots were measured over 2 years time, but have different numbers of intermediate measurements.

Thank you, Paul

Paul Prew   ▪  Statistician
651-795-5942   ▪   fax 651-204-7504
Ecolab Research Center   ▪  Mail Stop ESC-F4412-A
655 Lone Oak Drive   ▪   Eagan, MN 55121-1560

CONFIDENTIALITY NOTICE:
This e-mail communication and any attachments may contain proprietary and privileged information for the use of the designated recipients named above.
Any unauthorized review, use, disclosure or distribution is prohibited.