[R-sig-ME] Links to that "slow" lmer example I asked about
Douglas Bates
bates at stat.wisc.edu
Wed Oct 7 00:03:19 CEST 2009
I enclose a timing using the currently released lme4 on the same
machine. You are right - it is considerably slower. Good thing I'm
working on the next version. Now I want you all to remember this when
you find out that I've all the internal structures - again! There's a
reason.
On Tue, Oct 6, 2009 at 4:44 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
> Thanks for providing the data, Paui. As I said in a private message,
> this helps a lot to be able to discuss a concrete example and I thank
> your colleague for allowing you to provide these.
>
> My initial timing is using the development version of the lme4
> package, the so-called lme4a. I separated the creation of the
> structures from the actual optimization. Setup takes about 3 seconds,
> optimization with nlminb about 7 seconds and optimization with bobyqa
> about 3.5 seconds.
>
> This was done on my desktop computer - a dual-core 2.0 GHz AMD64 with
> 4GB of memory.
>
> On Tue, Oct 6, 2009 at 2:57 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>> Dear Everybody:
>>
>> I asked a couple of weeks ago about the puzzle that my colleague's
>> linear mixed model can be estimated in HLM6 in 3 seconds, while lmer
>> requires about 50 seconds. I wondered if that was expected/known.
>>
>> The discussion seemed to end with the conclusion "if you expect us to
>> evaluate that, give us the working example." Due to a death in my
>> family, I was delayed in responding, but here are the links to the
>> example data and code.
>>
>> The data frame is saved with R's write function
>>
>> http://pj.freefaculty.org/R/MixedModel/myframe.Rdata
>>
>> I believe that is workable on all platforms. That's about 26,000 rows.
>> Variable V33 represents the groups (in this case, country). The
>> variables are generically named V1-V33.
>>
>> The small simple test program to load the data and estimate the model:
>>
>> http://pj.freefaculty.org/R/MixedModel/replicateMM.R
>>
>> The output I get, which has system.time wrapped around the use of lmer:
>>
>> http://pj.freefaculty.org/R/MixedModel/replicateMM.Rout
>>
>> I get
>> ## user system elapsed
>> ##55.448 0.216 55.756
>>
>> Please remember I am not saying that lmer should work faster. I
>> understand it is capable of estimating models that other programs
>> cannot. I'm only trying to explain to a user why this one linear
>> model takes more time in lmer than in HLM6.
>>
>> pj
>> --
>> Paul E. Johnson
>> Professor, Political Science
>> 1541 Lilac Lane, Room 504
>> University of Kansas
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
-------------- next part --------------
R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> ## crude benchmark of speed on numerical linear algebra
> mm <- matrix(rnorm(1000 * 1000), nc = 1000)
> system.time(solve(mm, rnorm(1000)))
user system elapsed
0.644 0.028 0.715
> system.time(solve(mm, rnorm(1000)))
user system elapsed
0.408 0.024 0.461
> system.time(solve(mm, rnorm(1000)))
user system elapsed
0.300 0.012 0.318
> rm(mm)
>
>
> load(url("http://pj.freefaculty.org/R/MixedModel/myframe.Rdata"))
> attr(myframe, "terms") <- NULL
> attr(myframe, "na.action") <- NULL
>
> str(myframe)
'data.frame': 23898 obs. of 33 variables:
$ V1 : num 5 5 5 3 7 9 9 4 7 9 ...
$ V2 : num 4 4 4 4 5 2 4 3 4 4 ...
$ V3 : num 28335 28335 28335 28335 28335 ...
$ V4 : num 4.78 4.78 4.78 4.78 4.78 ...
$ V5 : num 2 0 0 1 0 0 0 0 0 0 ...
$ V6 : num 5 5 3 2 4 2 3 2 4 4 ...
$ V7 : num 3 1 1 2 1 2 1 2 1 1 ...
$ V8 : num 2 2 2 4 1 1 1 4 2 1 ...
$ V9 : num 2 1 2 2 2 2 1 4 2 1 ...
$ V10: num 1 0 0 0 1 0 1 1 0 1 ...
$ V11: num 0 0 1 1 0 1 0 0 0 0 ...
$ V12: num 0 1 0 0 0 0 0 0 0 0 ...
$ V13: num 0 0 1 0 0 1 0 0 0 0 ...
$ V14: num 0 0 0 0 0 0 0 0 0 0 ...
$ V15: num 0 0 0 1 0 0 1 0 0 0 ...
$ V16: num 0 0 0 0 0 0 0 0 0 0 ...
$ V17: num 0 0 0 0 0 0 0 0 0 0 ...
$ V18: num 1 1 1 1 1 1 1 2 2 1 ...
$ V19: num 0 0 0 0 0 0 1 0 0 0 ...
$ V20: num 1 0 1 1 1 1 0 0 1 0 ...
$ V21: num 0 1 0 0 0 0 0 0 0 0 ...
$ V22: num 0 0 0 0 0 0 0 0 0 0 ...
$ V23: num 0 0 1 1 1 0 0 0 0 0 ...
$ V24: num 0 0 0 0 0 0 1 0 0 1 ...
$ V25: num 0 1 0 0 0 1 0 0 1 0 ...
$ V26: num 1 0 1 0 1 1 -1 0 1 1 ...
$ V27: num 0 0 0 0 2 0 1 1 0 1 ...
$ V28: num 2 1 3 2 2 3 2 1 1 2 ...
$ V29: num 0 0 0 0 0 0 0 0 0 0 ...
$ V30: num 0 0 0 0 8 0 3 2 0 4 ...
$ V31: num 0 0 0 0 10 0 4 3 0 4 ...
$ V32: num 0 0 0 0 2 0 -1 0 0 1 ...
$ V33: Factor w/ 25 levels "Austria","Belgium",..: 2 2 2 2 2 2 2 2 2 2 ...
> summary(myframe)
V1 V2 V3 V4
Min. :3.000 Min. :2.000 Min. :10270 Min. :1.260
1st Qu.:5.000 1st Qu.:4.000 1st Qu.:16357 1st Qu.:2.000
Median :7.000 Median :4.000 Median :26750 Median :3.930
Mean :6.315 Mean :4.089 Mean :23385 Mean :3.758
3rd Qu.:8.000 3rd Qu.:5.000 3rd Qu.:27756 3rd Qu.:5.680
Max. :9.000 Max. :6.000 Max. :62298 Max. :6.530
V5 V6 V7 V8
Min. :0.0000 Min. :2.000 Min. :1.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000
Median :0.0000 Median :4.000 Median :1.000 Median :1.000
Mean :0.4353 Mean :3.562 Mean :1.545 Mean :1.708
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.0000 Max. :6.000 Max. :3.000 Max. :5.000
V9 V10 V11 V12
Min. :1.00 Min. :0.0000 Min. :0.0000 Min. :0.00000
1st Qu.:2.00 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
Median :2.00 Median :0.0000 Median :0.0000 Median :0.00000
Mean :2.61 Mean :0.4377 Mean :0.3108 Mean :0.01151
3rd Qu.:4.00 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.00000
Max. :6.00 Max. :1.0000 Max. :1.0000 Max. :1.00000
V13 V14 V15 V16
Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
Median :0.00000 Median :0.0000 Median :0.0000 Median :0.00000
Mean :0.07164 Mean :0.1809 Mean :0.1493 Mean :0.05415
3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.00000
V17 V18 V19 V20
Min. :0.0000 Min. :1.000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :2.000 Median :0.0000 Median :0.0000
Mean :0.1783 Mean :1.556 Mean :0.1577 Mean :0.3507
3rd Qu.:0.0000 3rd Qu.:2.000 3rd Qu.:0.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :2.000 Max. :1.0000 Max. :1.0000
V21 V22 V23 V24
Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :0.00000 Median :0.0000 Median :0.0000
Mean :0.2678 Mean :0.07595 Mean :0.2111 Mean :0.2652
3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000
V25 V26 V27 V28
Min. :0.0000 Min. :-3.00000 Min. :0.0000 Min. :1.000
1st Qu.:0.0000 1st Qu.: 0.00000 1st Qu.:0.0000 1st Qu.:1.000
Median :0.0000 Median : 0.00000 Median :1.0000 Median :2.000
Mean :0.2466 Mean : 0.08398 Mean :0.9125 Mean :1.899
3rd Qu.:0.0000 3rd Qu.: 0.00000 3rd Qu.:1.0000 3rd Qu.:2.000
Max. :1.0000 Max. : 3.00000 Max. :2.0000 Max. :3.000
V29 V30 V31 V32
Min. :0.0000 Min. : 0.000 Min. : 0.000 Min. :-6.00000
1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00000
Median :0.0000 Median : 3.000 Median : 4.000 Median : 0.00000
Mean :0.3912 Mean : 3.155 Mean : 3.679 Mean : 0.03741
3rd Qu.:0.0000 3rd Qu.: 4.000 3rd Qu.: 5.000 3rd Qu.: 0.00000
Max. :4.0000 Max. :12.000 Max. :12.000 Max. : 6.00000
V33
Slovakia : 1245
CzechRepublic: 1069
WGermany : 1033
Denmark : 1027
Spain : 1019
France : 1018
(Other) :17487
Warning message:
closing unused connection 3 (gzcon(http://pj.freefaculty.org/R/MixedModel/myframe.Rdata))
> ## several variables are binary and probably should be recoded as factors
> sapply(myframe, function(v) length(unique(v)))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
7 5 24 24 3 5 3 5 6 2 2 2 2 2 2 2 2 2 2 2
V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33
2 2 2 2 2 7 3 3 4 9 9 11 25
> myframeA <-
+ do.call(data.frame, lapply(myframe,
+ function(v)
+ if(length(unique(v)) == 2) factor(v) else v))
> str(myframeA)
'data.frame': 23898 obs. of 33 variables:
$ V1 : num 5 5 5 3 7 9 9 4 7 9 ...
$ V2 : num 4 4 4 4 5 2 4 3 4 4 ...
$ V3 : num 28335 28335 28335 28335 28335 ...
$ V4 : num 4.78 4.78 4.78 4.78 4.78 ...
$ V5 : num 2 0 0 1 0 0 0 0 0 0 ...
$ V6 : num 5 5 3 2 4 2 3 2 4 4 ...
$ V7 : num 3 1 1 2 1 2 1 2 1 1 ...
$ V8 : num 2 2 2 4 1 1 1 4 2 1 ...
$ V9 : num 2 1 2 2 2 2 1 4 2 1 ...
$ V10: Factor w/ 2 levels "0","1": 2 1 1 1 2 1 2 2 1 2 ...
$ V11: Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 1 1 1 ...
$ V12: Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
$ V13: Factor w/ 2 levels "0","1": 1 1 2 1 1 2 1 1 1 1 ...
$ V14: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ V15: Factor w/ 2 levels "0","1": 1 1 1 2 1 1 2 1 1 1 ...
$ V16: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ V17: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ V18: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 2 2 1 ...
$ V19: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 1 1 1 ...
$ V20: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 1 1 2 1 ...
$ V21: Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
$ V22: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ V23: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 1 1 ...
$ V24: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 1 1 2 ...
$ V25: Factor w/ 2 levels "0","1": 1 2 1 1 1 2 1 1 2 1 ...
$ V26: num 1 0 1 0 1 1 -1 0 1 1 ...
$ V27: num 0 0 0 0 2 0 1 1 0 1 ...
$ V28: num 2 1 3 2 2 3 2 1 1 2 ...
$ V29: num 0 0 0 0 0 0 0 0 0 0 ...
$ V30: num 0 0 0 0 8 0 3 2 0 4 ...
$ V31: num 0 0 0 0 10 0 4 3 0 4 ...
$ V32: num 0 0 0 0 2 0 -1 0 0 1 ...
$ V33: Factor w/ 25 levels "Austria","Belgium",..: 2 2 2 2 2 2 2 2 2 2 ...
> summary(myframeA)
V1 V2 V3 V4
Min. :3.000 Min. :2.000 Min. :10270 Min. :1.260
1st Qu.:5.000 1st Qu.:4.000 1st Qu.:16357 1st Qu.:2.000
Median :7.000 Median :4.000 Median :26750 Median :3.930
Mean :6.315 Mean :4.089 Mean :23385 Mean :3.758
3rd Qu.:8.000 3rd Qu.:5.000 3rd Qu.:27756 3rd Qu.:5.680
Max. :9.000 Max. :6.000 Max. :62298 Max. :6.530
V5 V6 V7 V8
Min. :0.0000 Min. :2.000 Min. :1.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000
Median :0.0000 Median :4.000 Median :1.000 Median :1.000
Mean :0.4353 Mean :3.562 Mean :1.545 Mean :1.708
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.0000 Max. :6.000 Max. :3.000 Max. :5.000
V9 V10 V11 V12 V13 V14 V15
Min. :1.00 0:13438 0:16470 0:23623 0:22186 0:19575 0:20330
1st Qu.:2.00 1:10460 1: 7428 1: 275 1: 1712 1: 4323 1: 3568
Median :2.00
Mean :2.61
3rd Qu.:4.00
Max. :6.00
V16 V17 V18 V19 V20 V21 V22
0:22604 0:19636 1:10600 0:20129 0:15517 0:17497 0:22083
1: 1294 1: 4262 2:13298 1: 3769 1: 8381 1: 6401 1: 1815
V23 V24 V25 V26 V27
0:18853 0:17560 0:18004 Min. :-3.00000 Min. :0.0000
1: 5045 1: 6338 1: 5894 1st Qu.: 0.00000 1st Qu.:0.0000
Median : 0.00000 Median :1.0000
Mean : 0.08398 Mean :0.9125
3rd Qu.: 0.00000 3rd Qu.:1.0000
Max. : 3.00000 Max. :2.0000
V28 V29 V30 V31
Min. :1.000 Min. :0.0000 Min. : 0.000 Min. : 0.000
1st Qu.:1.000 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.000
Median :2.000 Median :0.0000 Median : 3.000 Median : 4.000
Mean :1.899 Mean :0.3912 Mean : 3.155 Mean : 3.679
3rd Qu.:2.000 3rd Qu.:0.0000 3rd Qu.: 4.000 3rd Qu.: 5.000
Max. :3.000 Max. :4.0000 Max. :12.000 Max. :12.000
V32 V33
Min. :-6.00000 Slovakia : 1245
1st Qu.: 0.00000 CzechRepublic: 1069
Median : 0.00000 WGermany : 1033
Mean : 0.03741 Denmark : 1027
3rd Qu.: 0.00000 Spain : 1019
Max. : 6.00000 France : 1018
(Other) :17487
>
> if (require(lme4a, quietly = TRUE)) {
+ ## Separate the setup time from the actual optimization time
+ cat ("Setup time\n\n")
+ print(system.time(fm1 <- lmer (V1 ~ V2*V3 + V4*V5 + V6*V5 + V7 + V8 +
+ V9 + V10 + V11 + V12 + V13 + V14 + V15 +
+ V16 + V17 + V18 + V19 + V20 + V21 + V22 +
+ V23 + V24 + V25 + V26 + V27 + V28 + V29 +
+ V30 + V31 + V32 + (1 | V33) +
+ (0 + V6 | V33) + (0 + V2 | V33) +
+ (0 + V5 | V33) + (0 + V26 | V33),
+ data=myframe, doFit = FALSE)
+ ))
+ cat("\nOptimization timings with nlminb\n\n")
+ ## optimize with nlminb
+ print(system.time(nlminb(c(1,1,1,1,1), fm1 at setPars, lower = c(0,0,0,0,0),
+ control = list(trace = 1))))
+ ## replicate the timing
+ print(system.time(nlminb(c(1,1,1,1,1), fm1 at setPars, lower = c(0,0,0,0,0))))
+ print(system.time(nlminb(c(1,1,1,1,1), fm1 at setPars, lower = c(0,0,0,0,0))))
+ if (require(minqa)) {
+ ## optimize with bobyqa
+ cat("\nOptimization timings with bobyqa\n\n")
+ print(system.time(bobyqa(c(1,1,1,1,1), fm1 at setPars,
+ lower = c(0,0,0,0,0),
+ control = list(iprint = 2))))
+ print(system.time(bobyqa(c(1,1,1,1,1), fm1 at setPars,
+ lower = c(0,0,0,0,0))))
+ print(system.time(bobyqa(c(1,1,1,1,1), fm1 at setPars,
+ lower = c(0,0,0,0,0))))
+ }
+ print(fm1, corr = FALSE)
+ } else {
+ if (require(lme4)) {
+ print(system.time(fm1 <- lmer (V1 ~ V2*V3 + V4*V5 + V6*V5 + V7 + V8 +
+ V9 + V10 + V11 + V12 + V13 + V14 + V15 +
+ V16 + V17 + V18 + V19 + V20 + V21 + V22 +
+ V23 + V24 + V25 + V26 + V27 + V28 + V29 +
+ V30 + V31 + V32 + (1 | V33) +
+ (0 + V6 | V33) + (0 + V2 | V33) +
+ (0 + V5 | V33) + (0 + V26 | V33),
+ myframe)))
+ print(system.time(fm1 <- lmer (V1 ~ V2*V3 + V4*V5 + V6*V5 + V7 + V8 +
+ V9 + V10 + V11 + V12 + V13 + V14 + V15 +
+ V16 + V17 + V18 + V19 + V20 + V21 + V22 +
+ V23 + V24 + V25 + V26 + V27 + V28 + V29 +
+ V30 + V31 + V32 + (1 | V33) +
+ (0 + V6 | V33) + (0 + V2 | V33) +
+ (0 + V5 | V33) + (0 + V26 | V33),
+ myframe)))
+ print(system.time(fm1 <- lmer (V1 ~ V2*V3 + V4*V5 + V6*V5 + V7 + V8 +
+ V9 + V10 + V11 + V12 + V13 + V14 + V15 +
+ V16 + V17 + V18 + V19 + V20 + V21 + V22 +
+ V23 + V24 + V25 + V26 + V27 + V28 + V29 +
+ V30 + V31 + V32 + (1 | V33) +
+ (0 + V6 | V33) + (0 + V2 | V33) +
+ (0 + V5 | V33) + (0 + V26 | V33),
+ myframe)))
+ }
+ }
Loading required package: lme4
Loading required package: Matrix
Loading required package: lattice
Attaching package: 'Matrix'
The following object(s) are masked from package:stats :
contr.helmert,
contr.poly,
contr.SAS,
contr.sum,
contr.treatment,
xtabs
The following object(s) are masked from package:base :
rcond
user system elapsed
48.699 0.168 49.904
user system elapsed
47.963 0.148 49.180
user system elapsed
45.062 0.040 46.258
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called 'lme4a'
> sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-pc-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lme4_0.999375-32 Matrix_0.999375-31 lattice_0.17-25
loaded via a namespace (and not attached):
[1] grid_2.9.2
>
> proc.time()
user system elapsed
158.413 0.720 163.360
More information about the R-sig-mixed-models
mailing list