[R-sig-ME] r-sig mixed models mailing list response

Mon Jan 21 15:24:52 CET 2013

On 13-01-20 10:22 PM, Belinda Burns wrote:
> Dear Ben,
> 

[cc'ing to r-sig-mixed-models]

> Thank you for your response to my question to the mixed models mailing
> list (see below). I wasn't sure if I could just reply directly to the
> mailing list request email so I thought I'd email you directly.
> Your comment on the heteroscedasticity makes some sense to me - however,
> a log transformation (of proportions + 1) does little to improve the
> distribution of the data.

   OK.

> I have thought long and hard about the data
> and they would seem to me best modelled using the negative binomial glmm
> but now I am getting further into realms for which I have very little
> support and I really like to know what I'm doing with my data!

  It might be a good idea to look at the books by Zuur and co-authors,
which are fairly ecologist-friendly. While I don't agree with everything
in them, they generally exhibit common sense and give good general
descriptions.

> If I transform the proportions into "count" data (#seconds spent
> grooming/ten minutes - hence with a ceiling of 600s) and model using a
> mixed effects negative binomial with zero inflation = false, the
> dispersion parameter is 0.205, indicating that it is underdispersed (?).

  Transformation of this sort doesn't make sense.  NB models
must be run on *count* data.  It sounds like your data act more
like a Beta distribution (used to model proportions), since
"number of seconds" isn't really that likely to act like a count.

> If I run the same model with zeroInflation=TRUE, then the dispersion
> parameter is 1.02...[see code and some output below] but I am unsure how
> exactly this model is dealing with the zero-inflation and what the
> Zero-inflation estimate means. Either way the residuals are not normally
> distributed. I have also thought about using a hurdle-style model (or
> rather, running separate analyses on a binary response [groomed mate or
> didn't] and truncated "count" response [if they groomed, how long did
> they spend grooming], since I am unsure if I can include random
> variables in the hurdle function in pscl). I sense that I may be trying
> to analyse variation that is just not there, with too few observations
> per individual, particularly if most of the time they are not doing the
> behaviour in question. If I try to run a zi-poisson glmm, I get an error
> message: " The function maximizer failed (couldn't find STD file). In
> addition: Warning message:running command 'C:\Windows\system32\cmd.exe
> /c "C:/.......glmmadmb.exe" -maxfn 500 -maxph 5' had status 1".
> 
> 
> glmmadmb(formula = Count_Grooms_Mate ~ Age_years_mean + Sex +
>     FamComp_Offspring + Species + Repro_phase_original + Sex *
>     Repro_phase_original + (1 | Group_no/Individual_no), data = adult3,
>     family = "nbinom", zeroInflation = TRUE)
> Number of observations: total=416, Group_no=13, Group_no:Individual_no=26
> Random effect variance(s):
> Group=Group_no
>              Variance    StdDev
> (Intercept) 2.994e-09 5.472e-05
> Group=Group_no:Individual_no
>              Variance   StdDev
> (Intercept) 2.673e-09 5.17e-05

Note that the random effects are extremely small, suggesting that
the amount of noise at the observation level (within the individual
counts) is big enough to swamp any observable effects of Group or
Individual within group.

  I would consider trying this with family="beta" and zero-inflation,
although admittedly that's a combination I have never tested.

Make sure to plot your data and make sure that the estimates
make sense!

> Negative binomial dispersion parameter: 1.0235 (std. err.: 0.15743)
> Zero-inflation: 0.48117  (std. err.:  0.03323 )
> 
> Log-likelihood: -1089.92
> 
> 
> Thank you for reading,
> 
> Belinda Burns
> 
> -------------------------------------------------------------------------------------------------------------------------------------------------
> 
> Belinda Burns <10517197 at ...> writes:
> 
>> Dear all,
>>
>> I hope this is the correct place for my question, if not, my apologies! I
>> am analysing several behaviour variables obtained by observing captive
>> gibbons. The raw values are in the form proportion of ten minutes spent
>> doing the behaviour, and most of the behaviours are zero-inflated and
>> negatively skewed.
>>
>> At the moment I am interested in modelling the proportion of time that
>> adult gibbons spend grooming their mates, such that the models take the
>> form:
>>
>> Grooms_mate~Age+Species*Sex+
> Family_composition+Repro_phase*Sex
>>
>> where species is a factor with 3 levels, family composition is a binary
>> variable (they either have offspring or not) and repro_phase is the
>> reproductive phase of the female (4 levels).
> 
>> Ideally I should be including individual and group as random effects
>> (individuals are nested within groups) and so I would like to use a
>> mixed model approach; however, diagnostic plots of residuals vs
>> fitted values show heteroscedasticity (increasing spread with
>> increasing fitted values) and plots of residuals vs predictors
>> suggests that one species is less variable than the other two and
>> gibbons with offspring are more variable than those without. The
>> inclusion of a species*family_composition weighted variance function
>> (using the weights= varIdent(form~1|Species*Family_composition) in a
>> gls model) seems to improve the homogeneity of the residuals...
> 
>> I therefore have two questions (among a million others!): Can I
>> include the two random effects in gls, or, vice versa, a varIdent
>> structure in lmer? (the only contact I know doing mixed modelling in
>> R uses lmer with MCMC estimation of p-values and so I am most
>> comfortable using that to include the random effects) How do I write
>> individual and group in as random effects considering individual is
>> nested in group?
> 
> lmer does not handle "R-side" effects (heteroscedasticity/varStruct/etc.)
> at present.  You should be able to use random=~1|group/individual
> in lme to account for individuals nested in groups.  However,
> heteroscedasticity is also a common feature of lognormal data: could
> you get away with some transformation of the form
> log(small_number+proportion)
> (realizing that picking small_number is a bit of a can of worms)?
> Or plogis(small_number+proportion)? (Should be roughly equivalent if
> the proportions are typically small.)
> 
>   Ben Bolker
> 
> 
> -- 
> Belinda Lee Burns, BSc(Hons)
> PhD student
> Email: burnsb02 at student.uwa.edu.au <mailto:burnsb02 at student.uwa.edu.au>