[R] repeated values, nlme, correlation structures

Spencer Graves spencer.graves at pdf.com
Sun Nov 20 18:02:50 CET 2005


	  Have you considered Monte Carlo?  Take your best model (and perhaps 
some plausible alternatives), and simulate data like what you have, but 
retaining the simulated chick identities.  Then analyze the simulated 
data both with and without the chick identities, averaging over the 
nestboxes, as you've done.  This is easy to do in R.


	  At a general, conceptual level, nlme and most other "parametric" 
statistical procedures use maximum likelihood.  The likelihood is the 
probability (density) of what we observe, considered as a function of 
the unknown parameters.  With mixed models, we use a marginal 
likelihood, integrating out the individual parameters for all the 
nestboxes and chicks, leaving the "fixed effect" parameters and the 
(co)variance parameters of the random effects.  This leads to a 
generalized least-squares problem, with the (co)variance parameters 
embedded in some way in the residual covariance matrix.

	  This converts the problem to one of understanding and modeling the 
covariance structure of the residuals.  If I've lost the identity of the 
chicks but I've got a good model for the covariance structure of the 
residuals, I think the answer using nestbox averages should be fairly 
close to the answer I'd get if I thought really hard and developed a 
likelihood more accurately suited to the problem.  This will be less 
true with a nonlinear model, and even less true if the number of chicks 
who die before the end of the experiment.  To answer these questions, 
I'd use Monte Carlo, as I suggested above.

	  Best Wishes,
	  Spencer Graves
Patrick Giraudoux wrote:

> Spencer Graves a écrit :
>>       You are concerned that, "using the mean of each age category as 
>> variable leads to a loss of information regarding the variance on the 
>> weight at each age and nestbox."  What information do you think you lose?
> The variance  around the mean weight of each age category. This 
> variation is a priori not considered in the model when using the mean 
> only, and not each value used to compute the mean..
>>       In particular, have you studied the residuals from your fit?  I 
>> would guess that the you probably have heterscedasticity with the 
>> variance of the residuals probably increasing with the age.  Plots of 
>> the absolute residuals might help identify this.  
> Yes, of course. At this stage using a  Continuous AR(1) as Correlation 
> Structure, reduces considerably heteroscedasticity up to quasi-normal.
>> Also, is the number of blue tits in each age constant, or does it 
>> change, e.g., as some of the chicks die?
> Yes, unfortunately, it may happen eventually.
>>       To try to assess how much information I lost (especially if some 
>> of the chicks died), I might plot the weights in each nest box and 
>> connect the dots manually, attempting to assign chick identity to the 
>> individual numbers.  I might do it two different ways, one best fit, 
>> and another "worst plausible".  Then I might try to fit models to 
>> these two "augmented data sets" as if I had the true chick identity.  
>> Then comparing these fits with the one you already have should help 
>> you evaluate what information you lost by using the averages AND give 
>> you a reasonable shot at recovering that information.  If the results 
>> were promising, I might generate more than two sets of assignments, 
>> involving other people in that task.
> OK, should not be that difficult (actually the data were given with 
> pseudo-ID numbers on each chicks and I started with this... until I 
> learned they were corresponding to nothing). I suppose one could go as 
> far as possible with the "worst possible" with random assignements and 
> permutations, and thus comparing the fits.
> Many thanks for the hint. I was really wondering what may mean no answer 
> on the list... Problem not clear enough, trivial solution or real 
> trouble for statisticians with such data? Quite  scaring to a 
> biologist...  Now, I am fixed.
>> If the results were promising, I might generate more than two sets of 
>> assignments, involving other people in that task. 
> Of course if some capable mixed-effect models specialist is interested 
> in having a look to the data set, I can send it off list.
> Many thanks again, Spencer, I can stick on the track, now...
> Best regards,
> Patrick
>>       Bon Chance
>>       Spencer Graves
>> Patrick Giraudoux wrote:
>>> Dear listers,
>>> My request of last week seems not to have drawn someone's attention. 
>>> Suppose it was not clear enough.
>>> I am coping with an observational study where people's aim was to fit 
>>> growth curve for a population of young blue tits. For logistic 
>>> reasons, people have not been capable to number each individual, but 
>>> they have a method to assess their age. Thus, nestboxes were visited 
>>> occasionnally, youngs aged and weighted.
>>> This makes a multilevel data set, with two classification factors:
>>> - the nestbox (youngs shared the same parents and general feeding 
>>> conditions)
>>> - age in each nestbox (animals from the same nestbox have been 
>>> weighed along time, which likely leads to time correlation)
>>> Life would have been heaven if individuals were numbered, and thus 
>>> nlme correlation structure implemented in the package be used easy. 
>>> As mentioned above, this could not be the case. In a first approach, 
>>> I actually used the mean weight of the youngs weighed at each age in 
>>> nest boxes for the variable "age", and could get a nice fit with 
>>> "nestbox" as random variable and corCAR1(form=~age|nestbox) as 
>>> covariation structure.
>>> modm0c<-nlme(pds~Asym/(1+exp((xmid-age)/scal)),
>>>     fixed=list(Asym~1,xmid~1,scal~1),
>>>     random=Asym+xmid~1|nestbox,data=croispulm,
>>>     start=list(fixed=c(10,5,2.2)),
>>>     method="ML",
>>>     corr=corCAR1(form=~age|nestbox)
>>>     )
>>> Assuming that I did not commited some error in setting model 
>>> parameters (?), this way of doing is not fully satisfying, since 
>>> using the mean of each age category as variable  leads to a  loss of 
>>> information regarding the variance on the weight at each age and 
>>> nestbox.
>>> My question is: is there a way to handle repeated values per group 
>>> (here several youngs in an age category in each nestbox) in such a case?
>>> I would really appreciate an answer, even negative...
>>> Kind regards,
>>> Patrick
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide! 
>>> http://www.R-project.org/posting-guide.html

Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915

More information about the R-help mailing list