[R-sig-ME] Fitting LMMs using the MixedModels package for Julia
Douglas Bates
bates at stat.wisc.edu
Tue Feb 23 18:41:40 CET 2016
As many readers of this list are aware, most of my development effort for
the last few years has been in the Julia language, in particular the
MixedModels package for Julia. There are several aspects of the Julia
language that allow for writing faster code than in R, especially for
iterative fitting of models to large data sets. The downside for the user
of switching to another language is, well, switching to another language.
Some users have taken the plunge and used Julia because the model fits in R
using lme4 were taking a long time, as in many hours. They have seen one
to two orders of magnitude differences in speed which, when things are
taking that long, is worth the pain of switching. If the fit in R is only
taking a few seconds then it is not worthwhile learning a new language just
to make that faster.
I think that performing an lmer-like fit in Julia is now sufficiently
straightforward that it will be worthwhile for others to try doing so. We
have developed a Julia package called RCall which allows a Julia user to
run an R process from within Julia. In particular, RCall makes it easy to
create a copy of an R data.frame as a Julia DataFrame object and use that
to fit a linear mixed model.
The steps are reasonably straightforward. I will illustrate with data on
ratings of about 3700 movies by about 6000 users. Of course, not every
user rates every movie. There are about 1,000,000 ratings in the data set.
These data are available as the MovieLens 1M Dataset at
http://grouplens.org/datasets/movielens/ and as the ml1m data set in the
Timings R package available at https://github.com/Stat990-033/Timings.
That package is not on CRAN because the data sets are too large. You must
install it in R with
install.packages("devtools")
devtools::Install_github("Stat990-033/Timings")
Downloads of Julia itself are at http://julialang.org/downloads/ After
installing Julia start it and add the packages
Pkg.add("RCall")
Pkg.add("MixedModels")
The actual model fit is performed as
julia> using DataFrames, MixedModels, RCall
julia> ml1m = rcopy("Timings::ml1m");
julia> @time m1 = fit!(lmm(Y ~ 1 + (1|G) + (1|H), ml1m))
24.956579 seconds (39.95 M allocations: 1.398 GB, 1.06% gc time)
Linear mixed model fit by maximum likelihood
logLik: -1331986.005811, deviance: 2663972.011622, AIC: 2663980.011622,
BIC: 2664027.274500
Variance components:
Variance Std.Dev.
G 0.12985210 0.36034996
H 0.36879694 0.60728654
Residual 0.81390867 0.90216887
Number of obs: 1000209; levels of grouping factors: 6040, 3706
Fixed-effects parameters:
Estimate Std.Error z value
(Intercept) 3.33902 0.0114624 291.302
The "using" directive is similar to the "library" or "require" functions in
R. The named Julia packages are, in R terminology, loaded and attached.
The "Timings::ml1m" expression is an R expression. It accesses the ml1m
object in the Timings package, loading the package first, if necessary.
The call to the Julia function lmm is similar to lmer but only creates the
model. The call to fit! is what causes the model to be fit.
As you can see, this fit takes about 25 seconds. A similar fit using lmer
takes about 40 minutes on the same machine.
I would be happy to answer questions about the MixedModels package but I
don't think this forum would be appropriate. It is a forum for questions
about fitting mixed-effects models with R. For the time being I would
suggest asking questions on the Google group called julia-stats
https://groups.google.com/forum/#!forum/julia-stats to which I am sending a
copy of this message.
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list