[R-sig-ME] How to supply random intercepts for new data, blme

Thu Dec 14 05:20:22 CET 2023

...Thinking some more about this, it occurs to me that one solution would
be to add a predictor consisting of the number of records obtained for each
individual, fitting this as a fixed effect. This seems better than trying
to model a random effect of individual ID, without providing that
information. (It is also more convenient, as the distribution of those
records is not remotely Gaussian in my case). More generally though, I
would be interested to know whether it is possible, and good practice, to
supply random effects to a model that have been determined by some external
process, such as another model or based on a priori expectations.

Best,
Fiona

*Dr Fiona Scarff*
*Murdoch University*

On Tue, Dec 12, 2023 at 2:45 PM Fiona Scarff <fiona.scarff.4 using gmail.com>
wrote:

> Hello folks,
>
> I have a mixed model that predicts the presence or absence of a species,
> given a set of predictors describing habitat. It is based on tracking data
> from tagged individual animals, together with pseudoabsences sampled from
> the surrounding area. The model assumes all individuals have the same
> habitat preferences (i.e. the slope coefficients are represented as fixed
> effects). By specifying random intercepts per individual, the model
> accommodates the fact that some individuals were tracked for much longer
> than others, so that the number of presences relative to pseudoabsences is
> much higher for some individuals.
>
> As part of a cross-validation exercise to assess model accuracy, I would
> like to make predictions on new data, involving different individuals from
> other geographic areas. As I already know how long these individuals were
> tracked for, I would like to supply that information (in the form of fitted
> random intercepts from a 'global' model fitted to all individuals) to the
> model for that particular cross validation fold. Otherwise, it seems like
> the estimate of model accuracy will be unnecessarily pessimistic, since the
> predictions for each new individual won't take into account whether they
> were extensively tracked or not.
>
> I am using an implementation of binary random forests with mixed effects
> due to Speiser and colleagues (see refs below). The random effects part of
> the model is produced using blme.
>
> My questions are:
> 1) Is this a bad idea? It involves taking intercepts from one model and
> inserting them into another.
> 2) If it is a bad idea, is there an alternative?
> 3) If it is a good idea, how is it best executed using a model derived
> from blme? I could find all the parts of the model object containing
> individual-specific terms and replace them (so far I've counted eight
> vectors in the model object that contain either the individual IDs, or
> values fitted to those IDs) , but is there a more elegant solution?
>
> A different approach could be to draw pseudoabsences for each individual
> equal to the number of presences recorded for that animal. That would mean
> that the ratio of presences to absences was fixed at 0.5 for all
> individuals. I avoided this in the first instance, because it would mean
> fitting models with few points for some individuals - as few as 5 or 10
> presences so n= presences + absence = 10–20.
>
> Many thanks for any help you can offer, Fiona
>
> References
> Speiser, J. L., et al. (2019). "BiMM forest: A random forest method for
> modeling clustered and longitudinal binary outcomes." Chemometrics and
> Intelligent Laboratory Systems *185*: 122-134)
>
> Speiser, J. L. (2021). "A random forest method with feature selection for
> developing medical prediction models with clustered and longitudinal data." Journal
> of Biomedical Informatics *117*: 103763.
>
>
> *Fiona Scarff*
> *Murdoch University*
>
>

	[[alternative HTML version deleted]]