[R-sig-ME] logistic growth, vexing choice of which timepoints to include

Fri Feb 6 22:54:47 CET 2009

Hello.  I'm looking for advice on how to make a seemingly unavoidable  
subjective choice in an analysis I'm doing, using the logistic growth  
model.  I'm using nlme, but that has nothing to do with my question,  
so I hope it's not too inappropriate for me to post this here.   
Reading the list archive suggests that it's not too hard to tempt this  
group into philosophical discussions :-)

I have growth data that can be reasonably modelled with a four- 
parameter logistic curve.  The experimental unit is a well in a  
microtitre plate and I get light absorbance readings over time that  
reflect cell density.  There are many wells on a plate, e.g. 96 or  
384, and experiments often span many plates.  Systematic differences  
between the wells can be, for example, specific genetic mutations  
carried by the cells and/or different chemicals added to the growth  
medium.  I am mostly interested in performing inference on the fixed  
effects, i.e. how the genetic perturbations, the chemicals, or their  
interactions, modify key growth parameters, especially the one  
inversely related to the underlying exponential growth rate we'd see  
in the absence of resource constraints (phi_4 in Pinheiro & Bates p.  
517).

Problem:  The number of cells inoculated into the wells at the start  
is quite small -- well below the detection threshhold for the light  
absorbance readings.  Therefore, each timecourse begins with a loooong  
string of zeros, before the classic sigmoidal shape kicks in.  And, of  
course, the timing of this happy event is both ill-defined and very  
variable across the wells.

For figure-making purposes, I removed some early timepoints that were  
uniformly zero for all wells.  Which made me wonder: why couldn't  
(shouldn't?) I do the same prior to model fitting?  When I fit the  
logistic growth model with and without these early timepoints, I get  
essentially the same estimated fixed effects and, even, estimated  
variances for the random effects.  But there *is* a substantial  
difference in the estimate of residual variance, which then obviously  
has a noticeable effect on the inference for the fixed effects and,  
especially, the one I care about.  Including all the timepoints drives  
the residual variance down, as you might expect.  But that almost  
seems misleading or artificial ... other collaborators I work with  
don't even start taking OD readings until the first 12 hours have  
passed, which makes their initial strings of zeros quite short,  
which ... gives them less statistical significance for the same  
observed effect size?!?

Does anyone have a comment or advice?

Thanks in advance for reading this,
Jenny

Jennifer Bryan
Department of Statistics and
   the Michael Smith Laboratories
University of British Columbia