[R-sig-ME] How can I make R using more than 1 core (8 available) on a Ubuntu Rstudio server ?

Nicolas Bédère n.bedere at gmail.com
Fri Jan 19 08:37:28 CET 2018


Dear Mr Bates, Bolker and Harold,

Thanks for your quick and enlightining answers!
I will then have a look at the different solutions you proposed (Julia and
glmTMB) waiting for you to rewrite your marvelous package from scratch to
break through this limit! :-)

Cheers


2018-01-19 0:52 GMT+01:00 Doran, Harold <HDoran at air.org>:

> A while back, I did run lmer using a very large model in Microsoft R vs R
> and the timing was indeed faster for the same model on the same computer.
> Not by any meaningful order of magnitude that would be life changing, but
> faster nonetheless.
>
>
>
> From: Douglas Bates <bates at stat.wisc.edu>
> Date: Thursday, January 18, 2018 at 3:30 PM
> To: AIR <hdoran at air.org>
> Cc: Nicolas Bédère <n.bedere at gmail.com>, "r-sig-mixed-models at r-project.org"
> <r-sig-mixed-models at r-project.org>
>
> Subject: Re: [R-sig-ME] How can I make R using more than 1 core (8
> available) on a Ubuntu Rstudio server ?
>
> On Thu, Jan 18, 2018 at 2:16 PM Doran, Harold <HDoran at air.org> wrote:
>
>> @DB, I thought you were retired :)
>
>
> I am retired.  I'm just not very good at it and keep coming in to the
> office to work on various projects.
>
> But, to the OP, lme4 functions already take advantage of many
>> computational methods that make computing these models to large data sets
>> faster than (virtually) all other packages for estimating mixed linear
>> models.
>>
>
> The MixedModels package in Julia will usually perform at least as well as
> lme4 and sometimes much better.  Of course, using it entails learning a bit
> of Julia.  I would point out that with the RCall and RData packages for
> Julia it is fairly straightforward to pass the data back and forth between
> R and Julia.
>
> The packages you might come across for parallel processing won't
>> necessarily apply here. For example, the foreach package is fantastic, but
>> could not be applied to a glmer model.
>>
>> Although, Doug, I do recall coming across some work I think in the
>> Microsoft R distribution that did some parallel computing for matrix
>> problems by default. I'm saying this by memory and cannot recall specifics.
>>
>
> The Microsoft R distribution (and, before that, Revolution R) use the MKL
> BLAS that I mentioned.  Thanks for the reminder.  It may be worthwhile
> trying with lme4.  Those benchmarks are somewhat disingenuous because they
> only benchmark some linear algebra operations which is what MKL does very
> well.  Interestingly, the most important operation for statisticians -
> obtaining least squares solutions - is not accelerated in the standard R
> solution.
>
>
>> With that said, I'm not certain parallel processing is the right thing to
>> do with problems of this sort. Iteration t+1 depends on iteration t and
>> when solutions to the problem live on a different processor, the expense of
>> combining those things back together is not always faster, but instead can
>> actually be even more expensive and slower.
>>
>
> Parallelizing model fitting code is very tricky.
>
> -----Original Message-----
>
>> From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org]
>> On Behalf Of Douglas Bates
>> Sent: Thursday, January 18, 2018 3:07 PM
>> To: Nicolas Bédère <n.bedere at gmail.com>
>> Cc: R SIG Mixed Models <r-sig-mixed-models at r-project.org>
>> Subject: Re: [R-sig-ME] How can I make R using more than 1 core (8
>> available) on a Ubuntu Rstudio server ?
>>
>> The procedure is fairly simple - just rewrite the lme4 package from
>> scratch. :-)
>>
>> On Thu, Jan 18, 2018 at 2:03 PM Nicolas Bédère <n.bedere at gmail.com>
>> wrote:
>>
>> > I want to run the *glmer* procedure on a “large” dataset (250,000
>> > observations). The model includes 5 fixed effects, 2 interactions
>> > terms and
>> > 3 random effects. It takes more than 15 min to run on my laptop
>> > (recent intel core i7, RAM = 4GO). Thus, the IT department of the
>> > University I am working at developed a Rstudio server based on the
>> > Ubuntu system. My problem is that 8 cores are available on this server
>> > but when I run the *glmer *procedure, only 1 of them is being used and
>> > it takes more than 1h to get the results... How can I solve that
>> > problem and improve time efficiency? I found on google I may have to
>> > use the parallel procedure but (i) I am not familiar at all with those
>> > informatics procedures and they look a bit complicated, (ii) the code
>> > I picked works with other functions in other packages such as
>> > *kmeans{stats}* (
>> >
>> > https://stackoverflow.com/questions/29998718/how-can-i-make-r-use-more
>> > -cpu-and-memory
>> > )
>> > but neither with *lmer *nor *glmer.*
>> >
>> >
>> >
>> > Can you please help with a simple procedure to tackle the problem?
>> >
>> >
>> > Many thanks !
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > R-sig-mixed-models at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list