[R] repeated values, nlme, correlation structures

Patrick Giraudoux patrick.giraudoux at univ-fcomte.fr
Sun Nov 20 18:45:58 CET 2005

Looks fine... and at least accessible to my current understanding and 
capacity.  I wonder if this kind of problem/method would not make a pure 
Bayesian very excited (I know one quite obsessional about it)... and 
propose an alternate approach from there (though beyond my own skill)...

Thanks a lot again. Will do like that ASAP (means w.e.).



Department of Environmental Biology
EA3184 usc INRA
University of Franche-Comte
25030 Besancon Cedex

tel. +33 381 665 745
fax +33 381 665 797

Spencer Graves a écrit :

>       Have you considered Monte Carlo?  Take your best model (and 
> perhaps some plausible alternatives), and simulate data like what you 
> have, but retaining the simulated chick identities.  Then analyze the 
> simulated data both with and without the chick identities, averaging 
> over the nestboxes, as you've done.  This is easy to do in R.
>       At a general, conceptual level, nlme and most other "parametric" 
> statistical procedures use maximum likelihood.  The likelihood is the 
> probability (density) of what we observe, considered as a function of 
> the unknown parameters.  With mixed models, we use a marginal 
> likelihood, integrating out the individual parameters for all the 
> nestboxes and chicks, leaving the "fixed effect" parameters and the 
> (co)variance parameters of the random effects.  This leads to a 
> generalized least-squares problem, with the (co)variance parameters 
> embedded in some way in the residual covariance matrix.
>       This converts the problem to one of understanding and modeling 
> the covariance structure of the residuals.  If I've lost the identity 
> of the chicks but I've got a good model for the covariance structure 
> of the residuals, I think the answer using nestbox averages should be 
> fairly close to the answer I'd get if I thought really hard and 
> developed a likelihood more accurately suited to the problem.  This 
> will be less true with a nonlinear model, and even less true if the 
> number of chicks who die before the end of the experiment.  To answer 
> these questions, I'd use Monte Carlo, as I suggested above.
>       Best Wishes,
>       Spencer Graves
> Patrick Giraudoux wrote:
>> Spencer Graves a écrit :
>>>       You are concerned that, "using the mean of each age category 
>>> as variable leads to a loss of information regarding the variance on 
>>> the weight at each age and nestbox."  What information do you think 
>>> you lose?
>> The variance  around the mean weight of each age category. This 
>> variation is a priori not considered in the model when using the mean 
>> only, and not each value used to compute the mean..
>>>       In particular, have you studied the residuals from your fit?  
>>> I would guess that the you probably have heterscedasticity with the 
>>> variance of the residuals probably increasing with the age.  Plots 
>>> of the absolute residuals might help identify this.  
>> Yes, of course. At this stage using a  Continuous AR(1) as 
>> Correlation Structure, reduces considerably heteroscedasticity up to 
>> quasi-normal.
>>> Also, is the number of blue tits in each age constant, or does it 
>>> change, e.g., as some of the chicks die?
>> Yes, unfortunately, it may happen eventually.
>>>       To try to assess how much information I lost (especially if 
>>> some of the chicks died), I might plot the weights in each nest box 
>>> and connect the dots manually, attempting to assign chick identity 
>>> to the individual numbers.  I might do it two different ways, one 
>>> best fit, and another "worst plausible".  Then I might try to fit 
>>> models to these two "augmented data sets" as if I had the true chick 
>>> identity.  Then comparing these fits with the one you already have 
>>> should help you evaluate what information you lost by using the 
>>> averages AND give you a reasonable shot at recovering that 
>>> information.  If the results were promising, I might generate more 
>>> than two sets of assignments, involving other people in that task.
>> OK, should not be that difficult (actually the data were given with 
>> pseudo-ID numbers on each chicks and I started with this... until I 
>> learned they were corresponding to nothing). I suppose one could go 
>> as far as possible with the "worst possible" with random assignements 
>> and permutations, and thus comparing the fits.
>> Many thanks for the hint. I was really wondering what may mean no 
>> answer on the list... Problem not clear enough, trivial solution or 
>> real trouble for statisticians with such data? Quite  scaring to a 
>> biologist...  Now, I am fixed.
>>> If the results were promising, I might generate more than two sets 
>>> of assignments, involving other people in that task. 
>> Of course if some capable mixed-effect models specialist is 
>> interested in having a look to the data set, I can send it off list.
>> Many thanks again, Spencer, I can stick on the track, now...
>> Best regards,
>> Patrick
>>>       Bon Chance
>>>       Spencer Graves
>>> Patrick Giraudoux wrote:
>>>> Dear listers,
>>>> My request of last week seems not to have drawn someone's 
>>>> attention. Suppose it was not clear enough.
>>>> I am coping with an observational study where people's aim was to 
>>>> fit growth curve for a population of young blue tits. For logistic 
>>>> reasons, people have not been capable to number each individual, 
>>>> but they have a method to assess their age. Thus, nestboxes were 
>>>> visited occasionnally, youngs aged and weighted.
>>>> This makes a multilevel data set, with two classification factors:
>>>> - the nestbox (youngs shared the same parents and general feeding 
>>>> conditions)
>>>> - age in each nestbox (animals from the same nestbox have been 
>>>> weighed along time, which likely leads to time correlation)
>>>> Life would have been heaven if individuals were numbered, and thus 
>>>> nlme correlation structure implemented in the package be used easy. 
>>>> As mentioned above, this could not be the case. In a first 
>>>> approach, I actually used the mean weight of the youngs weighed at 
>>>> each age in nest boxes for the variable "age", and could get a nice 
>>>> fit with "nestbox" as random variable and 
>>>> corCAR1(form=~age|nestbox) as covariation structure.
>>>> modm0c<-nlme(pds~Asym/(1+exp((xmid-age)/scal)),
>>>>     fixed=list(Asym~1,xmid~1,scal~1),
>>>>     random=Asym+xmid~1|nestbox,data=croispulm,
>>>>     start=list(fixed=c(10,5,2.2)),
>>>>     method="ML",
>>>>     corr=corCAR1(form=~age|nestbox)
>>>>     )
>>>> Assuming that I did not commited some error in setting model 
>>>> parameters (?), this way of doing is not fully satisfying, since 
>>>> using the mean of each age category as variable  leads to a  loss 
>>>> of information regarding the variance on the weight at each age and 
>>>> nestbox.
>>>> My question is: is there a way to handle repeated values per group 
>>>> (here several youngs in an age category in each nestbox) in such a 
>>>> case?
>>>> I would really appreciate an answer, even negative...
>>>> Kind regards,
>>>> Patrick
>>>> ______________________________________________
>>>> R-help at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide! 
>>>> http://www.R-project.org/posting-guide.html

More information about the R-help mailing list