[R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with

Tue Aug 28 16:08:49 CEST 2018

I'm late to the party on this one, but I've been playing around with
this issue by (ab)using brms::brm_multiple() :

library("tidyverse")
library("brms")

rstan::rstan_options(autowrite=TRUE)
options(mc.cores=2)

dat.split <- dat %>%
    select(A,B,C,G) %>%
    group_by(G) %>%
    mutate(slice=sample.int(100,n(),replace=TRUE)) %>%
    ungroup()

dat.split <- split(dat.split, dat.split$slice)

dat.model.split <- brm_multiple(log10(A) ~ 1 + scale(B) * scale(C) + (1|G),
                               algorithm="sampling",
                               prior=set_prior("normal(0,2)",class="b"),
                               save_all_pars=TRUE,
                               sample_prior = TRUE,
                               family=student(),
                               chains=2,
                               iter=2e3,
                               data=dat.split)

This fits models on different subsets and then combines the posteriors.
Maybe a better Bayesian than me can point out any flaws or potential
pitfalls in this approach.

If you use a slightly less automated approach for the split and combine,
you can run multiple chains for multiple models in parallel. For the
automated split-combine with brm_multiple, each model is run in
sequence, although the chains within a given model are run in parallel
if mc.cores > 1.

Best,
Phillip

On 08/23/2018 10:08 PM, Ben Bolker wrote:
> 
>   Harold, what do you think of my suggestion (partition problem into
> multiple conditionally independent subsets, evaluate separate deviances
> on workers, run top-level optimization on a central 'master' processor)?
>  Am I missing something (except that some problems can't easily be
> partitioned that way?)
> 
>   FWIW I think Doug Bates has pointed out in the past that for simple
> (e.g. nested, not crossed) designs, the whole problem can be
> reformulated in a more efficient way (of course I can't dig up that
> e-mail ...).  lme4's strength is that it can handle the complex cases,
> and so far no-one has had the time/energy/interest/capability of
> implementing any of Doug's "special case" strategies, at least in lme4
> -- may be done elsewhere in R, or in Doug's MixedModels.jl ...
> 
>   cheers
>    Ben Bolker
> 
> 
> On 2018-08-23 03:32 PM, Doran, Harold wrote:
>> Running the model on multiple cores won’t work because lmer isn’t written that way. One idea I’ve toyed with is start with a small-ish sample and get results. Plug those in as starting values to your next run which uses larger sample, but takes fewer steps because you’re closer to the max. Repeat until the difference in the param estimates from prior run is less than some tolerance.
>>
>>
>> From: Adam Mills-Campisi <adammillscampisi using gmail.com>
>> Sent: Thursday, August 23, 2018 3:25 PM
>> To: Doran, Harold <HDoran using air.org>
>> Cc: r-sig-mixed-models using r-project.org
>> Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with
>>
>> That's the plan, the real question is how big should the samples be. The faster we can estimate the model, the bigger the sample can be. If I can run the model on multiple cores that would significantly increase the sample size.
>>
>> On Thu, Aug 23, 2018 at 12:23 PM Doran, Harold <HDoran using air.org<mailto:HDoran using air.org>> wrote:
>> One idea, though, is you can take samples from your very large data set and estimate models on the samples very quickly.
>>
>> -----Original Message-----
>> From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org<mailto:r-sig-mixed-models-bounces using r-project.org>> On Behalf Of Adam Mills-Campisi
>> Sent: Thursday, August 23, 2018 3:18 PM
>> To: r-sig-mixed-models using r-project.org<mailto:r-sig-mixed-models using r-project.org>
>> Subject: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with
>>
>> I am estimating a piecewise exponential, mixed-effects, survival model with recurrent events. Each individual in the dataset gets an individual interpret (where using a PWP approach). Our full dataset has 10 million individuals, with 180 million events. I am not sure that there is any framework which can accommodate data at that size, so we are going to sample. Our final sample size largely depends on how quickly we can estimate the model, which brings me to my question: Is there a way to mutli-thread/core the model? I tried to find some kind of instruction on the web and the best lead I could find was a reference to this list serve.
>> Any help would be greatly appreciated.
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>