[R] repeated values, nlme, correlation structures

Patrick Giraudoux patrick.giraudoux at univ-fcomte.fr
Sun Nov 20 18:45:58 CET 2005

```Looks fine... and at least accessible to my current understanding and
capacity.  I wonder if this kind of problem/method would not make a pure
Bayesian very excited (I know one quite obsessional about it)... and
propose an alternate approach from there (though beyond my own skill)...

Thanks a lot again. Will do like that ASAP (means w.e.).

Patrick

--

Department of Environmental Biology
EA3184 usc INRA
University of Franche-Comte
25030 Besancon Cedex
(France)

tel. +33 381 665 745
fax +33 381 665 797
http://lbe.univ-fcomte.fr

Spencer Graves a écrit :

> ANOTHER CONCRETE SUGGESTION:
>
>       Have you considered Monte Carlo?  Take your best model (and
> perhaps some plausible alternatives), and simulate data like what you
> have, but retaining the simulated chick identities.  Then analyze the
> simulated data both with and without the chick identities, averaging
> over the nestboxes, as you've done.  This is easy to do in R.
>
> PHILOSOPHY:
>
>       At a general, conceptual level, nlme and most other "parametric"
> statistical procedures use maximum likelihood.  The likelihood is the
> probability (density) of what we observe, considered as a function of
> the unknown parameters.  With mixed models, we use a marginal
> likelihood, integrating out the individual parameters for all the
> nestboxes and chicks, leaving the "fixed effect" parameters and the
> (co)variance parameters of the random effects.  This leads to a
> generalized least-squares problem, with the (co)variance parameters
> embedded in some way in the residual covariance matrix.
>
>       This converts the problem to one of understanding and modeling
> the covariance structure of the residuals.  If I've lost the identity
> of the chicks but I've got a good model for the covariance structure
> of the residuals, I think the answer using nestbox averages should be
> fairly close to the answer I'd get if I thought really hard and
> developed a likelihood more accurately suited to the problem.  This
> will be less true with a nonlinear model, and even less true if the
> number of chicks who die before the end of the experiment.  To answer
> these questions, I'd use Monte Carlo, as I suggested above.
>
>       Best Wishes,
>       Spencer Graves
>
> Patrick Giraudoux wrote:
>
>> Spencer Graves a écrit :
>>
>>>       You are concerned that, "using the mean of each age category
>>> as variable leads to a loss of information regarding the variance on
>>> the weight at each age and nestbox."  What information do you think
>>> you lose?
>>
>>
>>
>> The variance  around the mean weight of each age category. This
>> variation is a priori not considered in the model when using the mean
>> only, and not each value used to compute the mean..
>>
>>>
>>>       In particular, have you studied the residuals from your fit?
>>> I would guess that the you probably have heterscedasticity with the
>>> variance of the residuals probably increasing with the age.  Plots
>>> of the absolute residuals might help identify this.
>>
>>
>>
>> Yes, of course. At this stage using a  Continuous AR(1) as
>> Correlation Structure, reduces considerably heteroscedasticity up to
>> quasi-normal.
>>
>>> Also, is the number of blue tits in each age constant, or does it
>>> change, e.g., as some of the chicks die?
>>
>>
>>
>> Yes, unfortunately, it may happen eventually.
>>
>>>
>>>       To try to assess how much information I lost (especially if
>>> some of the chicks died), I might plot the weights in each nest box
>>> and connect the dots manually, attempting to assign chick identity
>>> to the individual numbers.  I might do it two different ways, one
>>> best fit, and another "worst plausible".  Then I might try to fit
>>> models to these two "augmented data sets" as if I had the true chick
>>> identity.  Then comparing these fits with the one you already have
>>> should help you evaluate what information you lost by using the
>>> averages AND give you a reasonable shot at recovering that
>>> information.  If the results were promising, I might generate more
>>> than two sets of assignments, involving other people in that task.
>>
>>
>>
>> OK, should not be that difficult (actually the data were given with
>> pseudo-ID numbers on each chicks and I started with this... until I
>> learned they were corresponding to nothing). I suppose one could go
>> as far as possible with the "worst possible" with random assignements
>> and permutations, and thus comparing the fits.
>>
>> Many thanks for the hint. I was really wondering what may mean no
>> answer on the list... Problem not clear enough, trivial solution or
>> real trouble for statisticians with such data? Quite  scaring to a
>> biologist...  Now, I am fixed.
>>
>>> If the results were promising, I might generate more than two sets
>>> of assignments, involving other people in that task.
>>
>>
>>
>> Of course if some capable mixed-effect models specialist is
>> interested in having a look to the data set, I can send it off list.
>>
>> Many thanks again, Spencer, I can stick on the track, now...
>>
>> Best regards,
>>
>> Patrick
>>
>>
>>>       Bon Chance
>>>       Spencer Graves
>>>
>>> Patrick Giraudoux wrote:
>>>
>>>> Dear listers,
>>>>
>>>> My request of last week seems not to have drawn someone's
>>>> attention. Suppose it was not clear enough.
>>>>
>>>> I am coping with an observational study where people's aim was to
>>>> fit growth curve for a population of young blue tits. For logistic
>>>> reasons, people have not been capable to number each individual,
>>>> but they have a method to assess their age. Thus, nestboxes were
>>>> visited occasionnally, youngs aged and weighted.
>>>>
>>>> This makes a multilevel data set, with two classification factors:
>>>>
>>>> - the nestbox (youngs shared the same parents and general feeding
>>>> conditions)
>>>> - age in each nestbox (animals from the same nestbox have been
>>>> weighed along time, which likely leads to time correlation)
>>>>
>>>> Life would have been heaven if individuals were numbered, and thus
>>>> nlme correlation structure implemented in the package be used easy.
>>>> As mentioned above, this could not be the case. In a first
>>>> approach, I actually used the mean weight of the youngs weighed at
>>>> each age in nest boxes for the variable "age", and could get a nice
>>>> fit with "nestbox" as random variable and
>>>> corCAR1(form=~age|nestbox) as covariation structure.
>>>>
>>>> modm0c<-nlme(pds~Asym/(1+exp((xmid-age)/scal)),
>>>>     fixed=list(Asym~1,xmid~1,scal~1),
>>>>     random=Asym+xmid~1|nestbox,data=croispulm,
>>>>     start=list(fixed=c(10,5,2.2)),
>>>>     method="ML",
>>>>     corr=corCAR1(form=~age|nestbox)
>>>>     )
>>>>
>>>> Assuming that I did not commited some error in setting model
>>>> parameters (?), this way of doing is not fully satisfying, since
>>>> using the mean of each age category as variable  leads to a  loss
>>>> of information regarding the variance on the weight at each age and
>>>> nestbox.
>>>>
>>>> My question is: is there a way to handle repeated values per group
>>>> (here several youngs in an age category in each nestbox) in such a
>>>> case?
>>>>
>>>> I would really appreciate an answer, even negative...
>>>>
>>>> Kind regards,
>>>>
>>>> Patrick
>>>>
>>>> ______________________________________________
>>>> R-help at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help