[R-sig-ME] GAMM- Missing/uncertain group id's in random effect

Chris Evans chr|@ho|d @end|ng |rom p@yctc@org
Tue Jul 4 00:23:31 CEST 2023


Formally, this is way, way above my IQ grade but I wonder if permuting 
across the possible matches would allow a sort of sensitivity/robustness 
exploration.  It sounds as if you there are correct boat IDs are unknown 
across a range of options.  Being very low IQ about this, say there are 
are 5 records with boat name "Sally" and they have a range of lengths 
say, 18m, 17m, 19m, 18m, 20m then I might write an algorithm to give me 
something like this:

run simID name length

1, Sally1, Sally, 18
1, Sally1, Sally, 17
1, Sally1, Sally, 19
1, Sally1, Sally, 18
1, Sally1, Sally, 20
2, Sally21, Sally, 18
2, Sally22, Sally, 17
2, Sally23, Sally, 19
2, Sally24, Sally, 18
2, Sally25, Sally, 20
3, Sally31, Sally, 18
3, Sally31, Sally, 17
3, Sally32, Sally, 19
3, Sally33, Sally, 18
3, Sally34, Sally, 20

...

Then I would look at the model results iterating over the run variable 
and using simID as the unique boat identifier.

Clearly if the combinations possible get very large there are all sorts 
of challenges but I wonder if there is literature on trying something 
like this?  My guess is that if this is combined with using some savvy 
of the sort Ben is suggesting then the numbers of possible ways of 
mapping the records to "pseudoBoats" might get fairly small and you 
might be able to show that the impacts on your findings would be small.

Mad or just perhaps possible?

Chris

P.S. I think this is the first time I've dared offer a suggestion here 
but I have learned an incredible amount from the list and am in awe of 
the skills _and_ the generosity of so many people who do answer things 
here: thanks.


On 03/07/2023 20:49, Ben Bolker wrote:
>    Unfortunately, I don't think there's an easy way to deal with this. 
> You *could* build a fancy Bayesian engine that would try to 
> estimate/impute uncertain boat IDs concurrently with the statistical 
> analysis (I think people *may* have done this for IDing individual 
> animals from camera trap or sighting data, but I don't remember).
>
>   However, my recommendation would be to try to come up with some 
> reasonable, objective heuristics for lumping IDs together. Can you use 
> information about co-occurrence of observations in space and time (or 
> not) to come up with a set of rules? (Basically, think about how you 
> would try to pick out suspected duplicates by eye, then try to 
> implement those rules in code.) For small, noisy data sets, if you 
> can't make reasonable guesses about duplicates, it's unlikely that a 
> computer will be able to do better.
>
>  Misclassification in this way (either incorrectly lumping or 
> splitting) might not make a huge difference to your results, as it 
> will affect the correlation of observations, not the observed 
> gear/effort/habitat/CPUE relationships directly ...
>
>   good luck,
>    Ben Bolker
>
>
> On 2023-06-30 12:23 p.m., Meaghan Rupprecht wrote:
>> I am currently modelling fish catch in the Amazon River in response 
>> to variables such as habitat, effort, gear type, and spatial 
>> locations. The model we have selected to accomplish this task is a 
>> GAMM, and we are currently using the mgcv and brms packages in R. We 
>> are facing an issue with random effects in our model, and I was 
>> hoping to get some insight about possible solutions. I've tried to 
>> find solutions online without much luck.
>>
>> For each record of fish catch, there is information recorded such as 
>> boat name and boat length. We aimed at using this information to 
>> generate a unique boat id, which would be treated as a group id for a 
>> random effect variable in our model. A problem arises in boat names 
>> because they may not be unique, but boat lengths are somewhat 
>> unreliable information and could have varying responses. This creates 
>> some records where the boat names may be the same with slightly 
>> varying lengths, resulting in multiple id's being generated for what 
>> might actually be the same boat (i.e., boat 1 with length of 17m and 
>> boat 1 with length of 17.2m). This greatly complicates our attempts 
>> at identifying unique boats and generates uncertainty in our 
>> classifications.
>> Is there a method for dealing with uncertainty or missing group id's 
>> in random effects? I'd be happy to elaborate or provide additional 
>> information if anything above was unclear.
>> Thanks for your time.
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- 
Chris Evans (he/him)
Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, 
University of Roehampton, London, UK.
Work web site: https://www.psyctc.org/psyctc/
CORE site: http://www.coresystemtrust.org.uk/
Personal site: https://www.psyctc.org/pelerinage2016/
Emeetings (Thursdays): 
https://www.psyctc.org/psyctc/booking-meetings-with-me/
(Beware: French time, generally an hour ahead of UK)
<https://ombook.psyctc.org/book>



More information about the R-sig-mixed-models mailing list