[R-sig-ME] Fitting LMMs using the MixedModels package for Julia
Douglas Bates
bates at stat.wisc.edu
Tue Feb 23 19:06:35 CET 2016
My statement that fitting this model using lmer on the same machine took
about 40 minutes was hearsay and quite inaccurate. When I checked it took
less than 3 minutes
> system.time(m1 <- lmer(Y ~ 1 + (1|G) + (1|H), ml1m, REML=FALSE))
user system elapsed
341.216 225.748 162.243
> m1
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Y ~ 1 + (1 | G) + (1 | H)
Data: ml1m
AIC BIC logLik deviance df.resid
2663980 2664027 -1331986 2663972 1000205
Random effects:
Groups Name Std.Dev.
G (Intercept) 0.3603
H (Intercept) 0.6073
Residual 0.9022
Number of obs: 1000209, groups: G, 6040; H, 3706
Fixed Effects:
(Intercept)
3.339
On Tue, Feb 23, 2016 at 11:41 AM Douglas Bates <bates at stat.wisc.edu> wrote:
> As many readers of this list are aware, most of my development effort for
> the last few years has been in the Julia language, in particular the
> MixedModels package for Julia. There are several aspects of the Julia
> language that allow for writing faster code than in R, especially for
> iterative fitting of models to large data sets. The downside for the user
> of switching to another language is, well, switching to another language.
>
> Some users have taken the plunge and used Julia because the model fits in
> R using lme4 were taking a long time, as in many hours. They have seen one
> to two orders of magnitude differences in speed which, when things are
> taking that long, is worth the pain of switching. If the fit in R is only
> taking a few seconds then it is not worthwhile learning a new language just
> to make that faster.
>
> I think that performing an lmer-like fit in Julia is now sufficiently
> straightforward that it will be worthwhile for others to try doing so. We
> have developed a Julia package called RCall which allows a Julia user to
> run an R process from within Julia. In particular, RCall makes it easy to
> create a copy of an R data.frame as a Julia DataFrame object and use that
> to fit a linear mixed model.
>
> The steps are reasonably straightforward. I will illustrate with data on
> ratings of about 3700 movies by about 6000 users. Of course, not every
> user rates every movie. There are about 1,000,000 ratings in the data set.
>
> These data are available as the MovieLens 1M Dataset at
> http://grouplens.org/datasets/movielens/ and as the ml1m data set in the
> Timings R package available at https://github.com/Stat990-033/Timings.
> That package is not on CRAN because the data sets are too large. You must
> install it in R with
>
> install.packages("devtools")
> devtools::Install_github("Stat990-033/Timings")
>
> Downloads of Julia itself are at http://julialang.org/downloads/ After
> installing Julia start it and add the packages
>
> Pkg.add("RCall")
> Pkg.add("MixedModels")
>
> The actual model fit is performed as
>
> julia> using DataFrames, MixedModels, RCall
>
> julia> ml1m = rcopy("Timings::ml1m");
>
> julia> @time m1 = fit!(lmm(Y ~ 1 + (1|G) + (1|H), ml1m))
> 24.956579 seconds (39.95 M allocations: 1.398 GB, 1.06% gc time)
> Linear mixed model fit by maximum likelihood
> logLik: -1331986.005811, deviance: 2663972.011622, AIC: 2663980.011622,
> BIC: 2664027.274500
>
> Variance components:
> Variance Std.Dev.
> G 0.12985210 0.36034996
> H 0.36879694 0.60728654
> Residual 0.81390867 0.90216887
> Number of obs: 1000209; levels of grouping factors: 6040, 3706
>
> Fixed-effects parameters:
> Estimate Std.Error z value
> (Intercept) 3.33902 0.0114624 291.302
>
>
> The "using" directive is similar to the "library" or "require" functions
> in R. The named Julia packages are, in R terminology, loaded and attached.
>
> The "Timings::ml1m" expression is an R expression. It accesses the ml1m
> object in the Timings package, loading the package first, if necessary.
> The call to the Julia function lmm is similar to lmer but only creates the
> model. The call to fit! is what causes the model to be fit.
>
> As you can see, this fit takes about 25 seconds. A similar fit using lmer
> takes about 40 minutes on the same machine.
>
> I would be happy to answer questions about the MixedModels package but I
> don't think this forum would be appropriate. It is a forum for questions
> about fitting mixed-effects models with R. For the time being I would
> suggest asking questions on the Google group called julia-stats
> https://groups.google.com/forum/#!forum/julia-stats to which I am sending
> a copy of this message.
>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list