[R-sig-ME] Fitting LMMs using the MixedModels package for Julia

Tue Feb 23 19:06:35 CET 2016

My statement that fitting this model using lmer on the same machine took
about 40 minutes was hearsay and quite inaccurate.  When I checked it took
less than 3 minutes

> system.time(m1 <- lmer(Y ~ 1 + (1|G) + (1|H), ml1m, REML=FALSE))
   user  system elapsed
341.216 225.748 162.243
> m1
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: Y ~ 1 + (1 | G) + (1 | H)
   Data: ml1m
     AIC      BIC   logLik deviance df.resid
 2663980  2664027 -1331986  2663972  1000205
Random effects:
 Groups   Name        Std.Dev.
 G        (Intercept) 0.3603
 H        (Intercept) 0.6073
 Residual             0.9022
Number of obs: 1000209, groups:  G, 6040; H, 3706
Fixed Effects:
(Intercept)
      3.339

On Tue, Feb 23, 2016 at 11:41 AM Douglas Bates <bates at stat.wisc.edu> wrote:

> As many readers of this list are aware, most of my development effort for
> the last few years has been in the Julia language, in particular the
> MixedModels package for Julia.  There are several aspects of the Julia
> language that allow for writing faster code than in R, especially for
> iterative fitting of models to large data sets.  The downside for the user
> of switching to another language is, well, switching to another language.
>
> Some users have taken the plunge and used Julia because the model fits in
> R using lme4 were taking a long time, as in many hours.  They have seen one
> to two orders of magnitude differences in speed which, when things are
> taking that long, is worth the pain of switching.  If the fit in R is only
> taking a few seconds then it is not worthwhile learning a new language just
> to make that faster.
>
> I think that performing an lmer-like fit in Julia is now sufficiently
> straightforward that it will be worthwhile for others to try doing so.  We
> have developed a Julia package called RCall which allows a Julia user to
> run an R process from within Julia.  In particular, RCall makes it easy to
> create a copy of an R data.frame as a Julia DataFrame object and use that
> to fit a linear mixed model.
>
> The steps are reasonably straightforward.  I will illustrate with data on
> ratings of  about 3700 movies by about 6000 users.  Of course, not every
> user rates every movie.  There are about 1,000,000 ratings in the data set.
>
> These data are available as the MovieLens 1M Dataset at
> http://grouplens.org/datasets/movielens/  and as the ml1m data set in the
> Timings  R package available at https://github.com/Stat990-033/Timings.
> That package is not on CRAN because the data sets are too large.  You must
> install it in R with
>
> install.packages("devtools")
> devtools::Install_github("Stat990-033/Timings")
>
> Downloads of Julia itself are at http://julialang.org/downloads/  After
> installing Julia start it and add the packages
>
> Pkg.add("RCall")
> Pkg.add("MixedModels")
>
> The actual model fit is performed as
>
> julia> using DataFrames, MixedModels, RCall
>
> julia> ml1m = rcopy("Timings::ml1m");
>
> julia> @time m1 = fit!(lmm(Y ~ 1 + (1|G) + (1|H), ml1m))
>  24.956579 seconds (39.95 M allocations: 1.398 GB, 1.06% gc time)
> Linear mixed model fit by maximum likelihood
>  logLik: -1331986.005811, deviance: 2663972.011622, AIC: 2663980.011622,
> BIC: 2664027.274500
>
> Variance components:
>            Variance   Std.Dev.
>  G        0.12985210 0.36034996
>  H        0.36879694 0.60728654
>  Residual 0.81390867 0.90216887
>  Number of obs: 1000209; levels of grouping factors: 6040, 3706
>
>   Fixed-effects parameters:
>              Estimate Std.Error z value
> (Intercept)   3.33902 0.0114624 291.302
>
>
> The "using" directive is similar to the "library" or "require" functions
> in R. The named Julia packages are, in R terminology, loaded and attached.
>
> The "Timings::ml1m" expression is an R expression.  It accesses the ml1m
> object in the Timings package, loading the package first, if necessary.
> The call to the Julia function lmm is similar to lmer but only creates the
> model.  The call to fit! is what causes the model to be fit.
>
> As you can see, this fit takes about 25 seconds.  A similar fit using lmer
> takes about 40 minutes on the same machine.
>
> I would be happy to answer questions about the MixedModels package but I
> don't think this forum would be appropriate.  It is a forum for questions
> about fitting mixed-effects models with R.  For the time being I would
> suggest asking questions on the Google group called julia-stats
> https://groups.google.com/forum/#!forum/julia-stats to which I am sending
> a copy of this message.
>
>

	[[alternative HTML version deleted]]