[R-sig-ME] Question regarding large data.frame in LMER?

Fri Dec 11 10:54:04 CET 2020

Hi Jad,

I've found that for some models on a data set of similar size to yours, glmmTMB was much faster than lme4. In my case the model was binomial, but it might be worth a try. It’s very easy to use — basically the same syntax as lme4.

Best wishes,
Paul

> On 11 Dec 2020, at 02:33, Vinicius Maia <vinicius.a.maia77 using gmail.com> wrote:
> 
> I agree with the comments above about scale and centering the continuous
> predictors and use poly instead of ^2.
> 
> How many levels do you have in country_year? It seems you have only two
> levels (1 and 2) in this variable.
> If you have only two levels in country_year it is not a good idea to treat
> this variable as random, you need more levels to estimate random slopes and
> intercepts.
> If it is your case, treating country_year as fixed may solve your problem.
> 
> Best,
> 
> Vinícius
> 
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Livre
> de vírus. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>.
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> 
> Em qui., 10 de dez. de 2020 às 19:13, João Veríssimo <jl.verissimo using gmail.com>
> escreveu:
> 
>> Not sure if these are solutions, but I'd try:
>> 
>> a) centering/scaling Age
>> 
>> and/or
>> 
>> b) using poly(Age, 2), rather than I(age^2)
>> (i.e., an orthogonal polynomial)
>> 
>> Maybe related to "badly scaled parameters"?
>> https://github.com/lme4/lme4/issues/173
>> 
>> João
>> 
>> On Thu, 2020-12-10 at 12:11 +0000, Jad Moawad wrote:
>>> I am working with a large data.frame that contains around 1.4 million
>>> observations. Initially when i was running my models, i was working
>>> on a sub-sample (10% of my full-sample). This is because running one
>>> model can take a lot of time using the original data. Once i was sure
>>> that all variables are well harmonized and all regressions were
>>> running fine, i ran my models using the full sample. However, the
>>> regression did not converge and i received the following two errors
>>> from two different models:
>>> 
>>> Error in fun(xaa, ...) : Downdated VtV is not positive definite
>>> 
>>> Error in fun(xss, ...) : Downdated VtV is not positive definite
>>> 
>>> I use the lmer function to fit my model and i include a random slopes
>>> at the country and country_year level. Below you find the code that i
>>> use.
>>> 
>>> Model1 <- lmer(health~ class + age + I(age^2)  +
>>> class*macro_unemployment +
>>>               (class + age + I(age^2)|country) +
>>>               (class+ age + I(age^2) |country_year) +
>>>               (1|id), data=df)
>>> 
>>> Model2 <- lmer(health~ education + age + I(age^2)  +
>>> education*macro_unemployment+
>>>               (education + age + I(age^2)|country) +
>>>               (education + age + I(age^2) |country_year) +
>>>               (1|id), data=df)
>>> 
>>> 
>>> Could someone help me please with solving this issue?
>>> 
>>> Below you find a glimpse (str) of my data and my sessionInfo():
>>> 
>>> tibble [1,370,264  8] (S3: grouped_df/tbl_df/tbl/data.frame)
>>> $ health            : num [1:1370264] 100 100 50 100 0 75 75 100 100
>>> 50 ...
>>> $ class             : Factor w/ 3 levels "Upper-middle class",..: 3
>>> 3 NA 3 3 3 3 1 1 3 ...
>>> $ education         : Factor w/ 3 levels "low","mid","high": 1 1 1 1
>>> 1 1 2 3 3 1 ...
>>> $ age               : num [1:1370264] 24 25 24 25 42 43 34 34 35 58
>>> ...
>>> $ macro_unemployment: num [1:1370264] 5.24 4.86 5.24 4.86 5.24 ...
>>> $ id                : int [1:1370264] 2 2 3 3 4 4 6 7 7 8 ...
>>> $ country_year      : int [1:1370264] 1 2 1 2 1 2 1 1 2 1 ...
>>> $ country           : Factor w/ 30 levels "Austria","Belgium",..: 1
>>> 1 1 1 1 1 1 1 1 1 ...
>>> - attr(*, "groups")= tibble [27  2] (S3: tbl_df/tbl/data.frame)
>>>  ..$ country: Factor w/ 30 levels "Austria","Belgium",..: 1 2 3 6 7
>>> 8 9 10 11 12 ...
>>>  ..$ .rows  : list<int> [1:27]
>>>  .. ..$ : int [1:47204] 1 2 3 4 5 6 7 8 9 10 ...
>>>  .. ..$ : int [1:41361] 47205 47206 47207 47208 47209 47210 47211
>>> 47212 47213 47214 ...
>>>  .. ..$ : int [1:42407] 88566 88567 88568 88569 88570 88571 88572
>>> 88573 88574 88575 ...
>>>  .. ..$ : int [1:48253] 130973 130974 130975 130976 130977 130978
>>> 130979 130980 130981 130982 ...
>>>  .. ..$ : int [1:31917] 179226 179227 179228 179229 179230 179231
>>> 179232 179233 179234 179235 ...
>>>  .. ..$ : int [1:44047] 211143 211144 211145 211146 211147 211148
>>> 211149 211150 211151 211152 ...
>>>  .. ..$ : int [1:62087] 255190 255191 255192 255193 255194 255195
>>> 255196 255197 255198 255199 ...
>>>  .. ..$ : int [1:94309] 317277 317278 317279 317280 317281 317282
>>> 317283 317284 317285 317286 ...
>>>  .. ..$ : int [1:37246] 411586 411587 411588 411589 411590 411591
>>> 411592 411593 411594 411595 ...
>>>  .. ..$ : int [1:77253] 448832 448833 448834 448835 448836 448837
>>> 448838 448839 448840 448841 ...
>>>  .. ..$ : int [1:16823] 526085 526086 526087 526088 526089 526090
>>> 526091 526092 526093 526094 ...
>>>  .. ..$ : int [1:24687] 542908 542909 542910 542911 542912 542913
>>> 542914 542915 542916 542917 ...
>>>  .. ..$ : int [1:116263] 567595 567596 567597 567598 567599 567600
>>> 567601 567602 567603 567604 ...
>>>  .. ..$ : int [1:43218] 683858 683859 683860 683861 683862 683863
>>> 683864 683865 683866 683867 ...
>>>  .. ..$ : int [1:28709] 727076 727077 727078 727079 727080 727081
>>> 727082 727083 727084 727085 ...
>>>  .. ..$ : int [1:27583] 755785 755786 755787 755788 755789 755790
>>> 755791 755792 755793 755794 ...
>>>  .. ..$ : int [1:77960] 783368 783369 783370 783371 783372 783373
>>> 783374 783375 783376 783377 ...
>>>  .. ..$ : int [1:36922] 861328 861329 861330 861331 861332 861333
>>> 861334 861335 861336 861337 ...
>>>  .. ..$ : int [1:93194] 898250 898251 898252 898253 898254 898255
>>> 898256 898257 898258 898259 ...
>>>  .. ..$ : int [1:9004] 991444 991445 991446 991447 991448 991449
>>> 991450 991451 991452 991453 ...
>>>  .. ..$ : int [1:40074] 1000448 1000449 1000450 1000451 1000452
>>> 1000453 1000454 1000455 1000456 1000457 ...
>>>  .. ..$ : int [1:29342] 1040522 1040523 1040524 1040525 1040526
>>> 1040527 1040528 1040529 1040530 1040531 ...
>>>  .. ..$ : int [1:85124] 1069864 1069865 1069866 1069867 1069868
>>> 1069869 1069870 1069871 1069872 1069873 ...
>>>  .. ..$ : int [1:92350] 1154988 1154989 1154990 1154991 1154992
>>> 1154993 1154994 1154995 1154996 1154997 ...
>>>  .. ..$ : int [1:50188] 1247338 1247339 1247340 1247341 1247342
>>> 1247343 1247344 1247345 1247346 1247347 ...
>>>  .. ..$ : int [1:7598] 1297526 1297527 1297528 1297529 1297530
>>> 1297531 1297532 1297533 1297534 1297535 ...
>>>  .. ..$ : int [1:65141] 1305124 1305125 1305126 1305127 1305128
>>> 1305129 1305130 1305131 1305132 1305133 ...
>>>  .. ..@ ptype: int(0)
>>>  ..- attr(*, ".drop")= logi TRUE
>>>> 
>>> 
>>> 
>>> Session Info:
>>> 
>>> R version 4.0.2 (2020-06-22)
>>> Platform: x86_64-apple-darwin17.0 (64-bit)
>>> Running under: macOS Catalina 10.15.6
>>> 
>>> Matrix products: default
>>> BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Fr
>>> ameworks/vecLib.framework/Versions/A/libBLAS.dylib
>>> LAPACK:
>>> /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack
>>> .dylib
>>> 
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices
>>> [4] utils     datasets  methods
>>> [7] base
>>> 
>>> other attached packages:
>>> [1] sessioninfo_1.1.1
>>> [2] sjlabelled_1.1.5
>>> [3] varhandle_2.0.5
>>> [4] labelled_2.7.0
>>> [5] dplyr_1.0.0
>>> [6] ggplot2_3.3.2
>>> [7] forcats_0.5.0
>>> [8] reprex_0.3.0
>>> [9] lmerTest_3.1-3
>>> [10] lme4_1.1-25
>>> [11] Matrix_1.2-18
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] Rcpp_1.0.4.6
>>> [2] compiler_4.0.2
>>> [3] pillar_1.4.4
>>> [4] nloptr_1.2.2.1
>>> [5] tools_4.0.2
>>> [6] digest_0.6.25
>>> [7] boot_1.3-25
>>> [8] statmod_1.4.34
>>> [9] lifecycle_0.2.0
>>> [10] tibble_3.0.1
>>> [11] nlme_3.1-148
>>> [12] gtable_0.3.0
>>> [13] lattice_0.20-41
>>> [14] pkgconfig_2.0.3
>>> [15] rlang_0.4.7
>>> [16] cli_2.0.2
>>> [17] rstudioapi_0.11
>>> [18] haven_2.3.1
>>> [19] withr_2.2.0
>>> [20] hms_0.5.3
>>> [21] generics_0.0.2
>>> [22] vctrs_0.3.1
>>> [23] fs_1.4.1
>>> [24] grid_4.0.2
>>> [25] tidyselect_1.1.0
>>> [26] glue_1.4.1
>>> [27] R6_2.4.1
>>> [28] fansi_0.4.1
>>> [29] minqa_1.2.4
>>> [30] farver_2.0.3
>>> [31] purrr_0.3.4
>>> [32] magrittr_1.5
>>> [33] scales_1.1.1
>>> [34] ellipsis_0.3.1
>>> [35] MASS_7.3-51.6
>>> [36] splines_4.0.2
>>> [37] insight_0.11.0
>>> [38] assertthat_0.2.1
>>> [39] colorspace_1.4-1
>>> [40] numDeriv_2016.8-1.1
>>> [41] labeling_0.3
>>> [42] utf8_1.1.4
>>> [43] munsell_0.5.0
>>> [44] crayon_1.3.4
>>> 
>>> 
>>> 
>>> 
>>> Sincerely,
>>> 
>>> 
>>> 
>>> Jad Moawad
>>> 
>>> 
>>> PhD candidate and teaching assistant
>>> University of Lausanne  - NCCR Lives
>>> Institut des Sciences Sociales
>>> Btiment Geopolis - 5621
>>> 1015 Lausanne
>>> Switzerland
>>> 
>>> 
>>>      [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models