[R-sig-ME] Efficient mixed logistic reg w 500k individuals

sree datta @reedt@8 @end|ng |rom gm@||@com
Sat Dec 26 07:14:42 CET 2020


With such a large dataset, I would recommend exploring interactions among
variables using ensemble methods such as Random Forests and Extreme
Gradient Boosting (since you have a binary dependent variable).
These models also correct against bias since with such a large dataset, you
may end up finding a lot of spurious and unstable relationships (both in
main effects and interaction effects) with such large N.
In terms of processing efficiency, have you tried using the *parallel* package
in R (in addition, I would also suggest *foreach* and *doParallel* package
to improve processing speed). For a more detailed description of
parallelism implemented in R see this article:
https://www.jigsawacademy.com/handling-big-data-using-r/ (a good summary of
packages).


<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, Dec 23, 2020 at 8:20 PM Mitchell Maltenfort <mmalten using gmail.com>
wrote:

> Here’s a fun one for you (I hope)
>
> I’m mucking about with a logistic regression that may have about 30 million
> records for half a million individuals.
>
> Yes, I have a large RAM machine - 64 Gig.  And I’ve used nAGQ 0 and other
> recommendations from
>
> http://angrystatistician.blogspot.com/2015/10/mixed-models-in-r-bigger-faster-stronger.html?m=1
>  which should be reasonable for the large data.
>
> It works but I’d still be interested in tweaks to improve speed or
> accuracy.  Any ideas?
> --
> Sent from Gmail Mobile
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list