[R-sig-ME] logistic growth, vexing choice of which timepoints to include

Steven McKinney smckinney at bccrc.ca
Sat Feb 7 00:18:53 CET 2009

Hi Jenny,

[Caveat: Comments from an applied statistician, not
a world-heavyweight likelihood theorist]

In the logistic world a zero value maps to a 
minus infinity value.  It seems to me that only
the 'last' zero value contains any information
relevant to the likelihood (equivalently only
the 'first' one value (plus infinity in the
logistic realm) contains any information
relevant to the likelihood).  Perhaps the
coding for the likelihood has not been set
up to take this into account so you are getting
the artificial contribution of the rest of the
zero values folded into the likelihood, artificially
deflating the variance estimates.

I would exclude or set to NA all but the last
(or even all of) the zero values for any well
and all but the first (or even all of) the one 

The zero values are really below the detection
limit of the sensor involved so should theoretically
be handled as truncated data but that's another
level of complexity for the analysis.

Steven McKinney

Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org on behalf of Jenny Bryan
Sent: Fri 2/6/2009 1:54 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] logistic growth,vexing choice of which timepoints to include
Hello.  I'm looking for advice on how to make a seemingly unavoidable  
subjective choice in an analysis I'm doing, using the logistic growth  
model.  I'm using nlme, but that has nothing to do with my question,  
so I hope it's not too inappropriate for me to post this here.   
Reading the list archive suggests that it's not too hard to tempt this  
group into philosophical discussions :-)

I have growth data that can be reasonably modelled with a four- 
parameter logistic curve.  The experimental unit is a well in a  
microtitre plate and I get light absorbance readings over time that  
reflect cell density.  There are many wells on a plate, e.g. 96 or  
384, and experiments often span many plates.  Systematic differences  
between the wells can be, for example, specific genetic mutations  
carried by the cells and/or different chemicals added to the growth  
medium.  I am mostly interested in performing inference on the fixed  
effects, i.e. how the genetic perturbations, the chemicals, or their  
interactions, modify key growth parameters, especially the one  
inversely related to the underlying exponential growth rate we'd see  
in the absence of resource constraints (phi_4 in Pinheiro & Bates p.  

Problem:  The number of cells inoculated into the wells at the start  
is quite small -- well below the detection threshhold for the light  
absorbance readings.  Therefore, each timecourse begins with a loooong  
string of zeros, before the classic sigmoidal shape kicks in.  And, of  
course, the timing of this happy event is both ill-defined and very  
variable across the wells.

For figure-making purposes, I removed some early timepoints that were  
uniformly zero for all wells.  Which made me wonder: why couldn't  
(shouldn't?) I do the same prior to model fitting?  When I fit the  
logistic growth model with and without these early timepoints, I get  
essentially the same estimated fixed effects and, even, estimated  
variances for the random effects.  But there *is* a substantial  
difference in the estimate of residual variance, which then obviously  
has a noticeable effect on the inference for the fixed effects and,  
especially, the one I care about.  Including all the timepoints drives  
the residual variance down, as you might expect.  But that almost  
seems misleading or artificial ... other collaborators I work with  
don't even start taking OD readings until the first 12 hours have  
passed, which makes their initial strings of zeros quite short,  
which ... gives them less statistical significance for the same  
observed effect size?!?

Does anyone have a comment or advice?

Thanks in advance for reading this,

Jennifer Bryan
Department of Statistics and
   the Michael Smith Laboratories
University of British Columbia

R-sig-mixed-models at r-project.org mailing list

More information about the R-sig-mixed-models mailing list