[R-sig-ME] logistic growth, vexing choice of which timepoints to include
Jenny Bryan
jenny at stat.ubc.ca
Fri Feb 6 22:54:47 CET 2009
Hello. I'm looking for advice on how to make a seemingly unavoidable
subjective choice in an analysis I'm doing, using the logistic growth
model. I'm using nlme, but that has nothing to do with my question,
so I hope it's not too inappropriate for me to post this here.
Reading the list archive suggests that it's not too hard to tempt this
group into philosophical discussions :-)
I have growth data that can be reasonably modelled with a four-
parameter logistic curve. The experimental unit is a well in a
microtitre plate and I get light absorbance readings over time that
reflect cell density. There are many wells on a plate, e.g. 96 or
384, and experiments often span many plates. Systematic differences
between the wells can be, for example, specific genetic mutations
carried by the cells and/or different chemicals added to the growth
medium. I am mostly interested in performing inference on the fixed
effects, i.e. how the genetic perturbations, the chemicals, or their
interactions, modify key growth parameters, especially the one
inversely related to the underlying exponential growth rate we'd see
in the absence of resource constraints (phi_4 in Pinheiro & Bates p.
517).
Problem: The number of cells inoculated into the wells at the start
is quite small -- well below the detection threshhold for the light
absorbance readings. Therefore, each timecourse begins with a loooong
string of zeros, before the classic sigmoidal shape kicks in. And, of
course, the timing of this happy event is both ill-defined and very
variable across the wells.
For figure-making purposes, I removed some early timepoints that were
uniformly zero for all wells. Which made me wonder: why couldn't
(shouldn't?) I do the same prior to model fitting? When I fit the
logistic growth model with and without these early timepoints, I get
essentially the same estimated fixed effects and, even, estimated
variances for the random effects. But there *is* a substantial
difference in the estimate of residual variance, which then obviously
has a noticeable effect on the inference for the fixed effects and,
especially, the one I care about. Including all the timepoints drives
the residual variance down, as you might expect. But that almost
seems misleading or artificial ... other collaborators I work with
don't even start taking OD readings until the first 12 hours have
passed, which makes their initial strings of zeros quite short,
which ... gives them less statistical significance for the same
observed effect size?!?
Does anyone have a comment or advice?
Thanks in advance for reading this,
Jenny
Jennifer Bryan
Department of Statistics and
the Michael Smith Laboratories
University of British Columbia
More information about the R-sig-mixed-models
mailing list