[R-sig-ME] ar1 structure too big, alternate formulation?

Thu Mar 10 15:15:50 CET 2022

   Since you have large clusters, MASS::glmmPQL could work (unless your 
abundance very low or very high (proportions present/absent close to 0 
or 1).
   I don't have a reference for fixed effect of presence/absence in the 
previous hour; if you do that, you're assuming perfect detectability.
   I have thought about some ways that the AR1 implementation in glmmTMB 
could be improved, but they're not quick and easy to implement.

   cheers
    Ben Bolker

On 3/10/22 9:08 AM, Mollie Brooks wrote:
> Hi,
> 
> I’m working on a data set where individually tagged cod were observed on ship wrecks (present or absent). We aggregated everything to an hourly scale. We include an AR1 random effect to account for the idea that being present at a particular wreck in one hour makes an individual fish more likely to be present in the next hour. The variable `ID_Wreck` is the combination of fish tag ID and wreck where they were observed; we dropped combinations that were never observed.
> 
> dat = na.omit(transform(dPresence1,
> 	time = factor(DateTimeCEST_Hourly),
> 	ID_Wreck = factor(paste(Tag_ID, Wreck, sep="_")))[,c("Presence","Speed_10m","Direction_10m","SeaLevel", "Temp_10m","Diel","time","ID_Wreck")])
> 
> There are 506 levels of `ID_Wreck`, each with up to 3550 hourly observations (min 214, median 3233).
> 
> I think I want to fit a model like
> 
> M0 <- glmmTMB(Presence ~ environ + ar1(time +0| ID_Wreck), family=binomial, data=dat)
> 
> where environ is some combination of fixed effects.
> 
> Even without any fixed effects, the AR1 random effect seems to cause a problem with the size of the structure.
> 
> Error in h(simpleError(msg, call)) :
>    error in evaluating the argument 'x' in selecting a method for function 't': missing value where TRUE/FALSE needed
> In addition: Warning message:
> In attributes(.Data) <- c(attributes(.Data), attrib) :
>    NAs introduced by coercion to integer range
> 
>  From what I’ve read, the last part is related to trying to allocate a data frame that is bigger than R can handle (I’m allocating 100GB on a cluster).
> https://stackoverflow.com/questions/55183203/how-to-create-data-frame-for-super-large-vectors <https://stackoverflow.com/questions/55183203/how-to-create-data-frame-for-super-large-vectors>
> 
> Is there a way to reformulate this problem so that the structures are small enough for R to handle?
> 
> What if I instead include a fixed effect of presence/absence in the previous hour? Is there a reference for how that is related to the random effect model (I know it’s easy, but references help)?
> 
> Cheers,
> Mollie
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics