[R-sig-ME] Question regarding large data.frame in LMER?

João Veríssimo j|@ver|@@|mo @end|ng |rom gm@||@com
Thu Dec 10 23:13:29 CET 2020


Not sure if these are solutions, but I'd try:

a) centering/scaling Age

and/or

b) using poly(Age, 2), rather than I(age^2)
(i.e., an orthogonal polynomial)

Maybe related to "badly scaled parameters"?
https://github.com/lme4/lme4/issues/173

João

On Thu, 2020-12-10 at 12:11 +0000, Jad Moawad wrote:
> I am working with a large data.frame that contains around 1.4 million
> observations. Initially when i was running my models, i was working
> on a sub-sample (10% of my full-sample). This is because running one
> model can take a lot of time using the original data. Once i was sure
> that all variables are well harmonized and all regressions were
> running fine, i ran my models using the full sample. However, the
> regression did not converge and i received the following two errors
> from two different models:
> 
> Error in fun(xaa, ...) : Downdated VtV is not positive definite
> 
> Error in fun(xss, ...) : Downdated VtV is not positive definite
> 
> I use the lmer function to fit my model and i include a random slopes
> at the country and country_year level. Below you find the code that i
> use.
> 
> Model1 <- lmer(health~ class + age + I(age^2)  +
> class*macro_unemployment +
>                (class + age + I(age^2)|country) +
>                (class+ age + I(age^2) |country_year) +
>                (1|id), data=df)
> 
> Model2 <- lmer(health~ education + age + I(age^2)  +
> education*macro_unemployment+
>                (education + age + I(age^2)|country) +
>                (education + age + I(age^2) |country_year) +
>                (1|id), data=df)
> 
> 
> Could someone help me please with solving this issue?
> 
> Below you find a glimpse (str) of my data and my sessionInfo():
> 
> tibble [1,370,264  8] (S3: grouped_df/tbl_df/tbl/data.frame)
>  $ health            : num [1:1370264] 100 100 50 100 0 75 75 100 100
> 50 ...
>  $ class             : Factor w/ 3 levels "Upper-middle class",..: 3
> 3 NA 3 3 3 3 1 1 3 ...
>  $ education         : Factor w/ 3 levels "low","mid","high": 1 1 1 1
> 1 1 2 3 3 1 ...
>  $ age               : num [1:1370264] 24 25 24 25 42 43 34 34 35 58
> ...
>  $ macro_unemployment: num [1:1370264] 5.24 4.86 5.24 4.86 5.24 ...
>  $ id                : int [1:1370264] 2 2 3 3 4 4 6 7 7 8 ...
>  $ country_year      : int [1:1370264] 1 2 1 2 1 2 1 1 2 1 ...
>  $ country           : Factor w/ 30 levels "Austria","Belgium",..: 1
> 1 1 1 1 1 1 1 1 1 ...
>  - attr(*, "groups")= tibble [27  2] (S3: tbl_df/tbl/data.frame)
>   ..$ country: Factor w/ 30 levels "Austria","Belgium",..: 1 2 3 6 7
> 8 9 10 11 12 ...
>   ..$ .rows  : list<int> [1:27]
>   .. ..$ : int [1:47204] 1 2 3 4 5 6 7 8 9 10 ...
>   .. ..$ : int [1:41361] 47205 47206 47207 47208 47209 47210 47211
> 47212 47213 47214 ...
>   .. ..$ : int [1:42407] 88566 88567 88568 88569 88570 88571 88572
> 88573 88574 88575 ...
>   .. ..$ : int [1:48253] 130973 130974 130975 130976 130977 130978
> 130979 130980 130981 130982 ...
>   .. ..$ : int [1:31917] 179226 179227 179228 179229 179230 179231
> 179232 179233 179234 179235 ...
>   .. ..$ : int [1:44047] 211143 211144 211145 211146 211147 211148
> 211149 211150 211151 211152 ...
>   .. ..$ : int [1:62087] 255190 255191 255192 255193 255194 255195
> 255196 255197 255198 255199 ...
>   .. ..$ : int [1:94309] 317277 317278 317279 317280 317281 317282
> 317283 317284 317285 317286 ...
>   .. ..$ : int [1:37246] 411586 411587 411588 411589 411590 411591
> 411592 411593 411594 411595 ...
>   .. ..$ : int [1:77253] 448832 448833 448834 448835 448836 448837
> 448838 448839 448840 448841 ...
>   .. ..$ : int [1:16823] 526085 526086 526087 526088 526089 526090
> 526091 526092 526093 526094 ...
>   .. ..$ : int [1:24687] 542908 542909 542910 542911 542912 542913
> 542914 542915 542916 542917 ...
>   .. ..$ : int [1:116263] 567595 567596 567597 567598 567599 567600
> 567601 567602 567603 567604 ...
>   .. ..$ : int [1:43218] 683858 683859 683860 683861 683862 683863
> 683864 683865 683866 683867 ...
>   .. ..$ : int [1:28709] 727076 727077 727078 727079 727080 727081
> 727082 727083 727084 727085 ...
>   .. ..$ : int [1:27583] 755785 755786 755787 755788 755789 755790
> 755791 755792 755793 755794 ...
>   .. ..$ : int [1:77960] 783368 783369 783370 783371 783372 783373
> 783374 783375 783376 783377 ...
>   .. ..$ : int [1:36922] 861328 861329 861330 861331 861332 861333
> 861334 861335 861336 861337 ...
>   .. ..$ : int [1:93194] 898250 898251 898252 898253 898254 898255
> 898256 898257 898258 898259 ...
>   .. ..$ : int [1:9004] 991444 991445 991446 991447 991448 991449
> 991450 991451 991452 991453 ...
>   .. ..$ : int [1:40074] 1000448 1000449 1000450 1000451 1000452
> 1000453 1000454 1000455 1000456 1000457 ...
>   .. ..$ : int [1:29342] 1040522 1040523 1040524 1040525 1040526
> 1040527 1040528 1040529 1040530 1040531 ...
>   .. ..$ : int [1:85124] 1069864 1069865 1069866 1069867 1069868
> 1069869 1069870 1069871 1069872 1069873 ...
>   .. ..$ : int [1:92350] 1154988 1154989 1154990 1154991 1154992
> 1154993 1154994 1154995 1154996 1154997 ...
>   .. ..$ : int [1:50188] 1247338 1247339 1247340 1247341 1247342
> 1247343 1247344 1247345 1247346 1247347 ...
>   .. ..$ : int [1:7598] 1297526 1297527 1297528 1297529 1297530
> 1297531 1297532 1297533 1297534 1297535 ...
>   .. ..$ : int [1:65141] 1305124 1305125 1305126 1305127 1305128
> 1305129 1305130 1305131 1305132 1305133 ...
>   .. ..@ ptype: int(0)
>   ..- attr(*, ".drop")= logi TRUE
> > 
> 
> 
> Session Info:
> 
> R version 4.0.2 (2020-06-22)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Catalina 10.15.6
> 
> Matrix products: default
> BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Fr
> ameworks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack
> .dylib
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices
> [4] utils     datasets  methods
> [7] base
> 
> other attached packages:
>  [1] sessioninfo_1.1.1
>  [2] sjlabelled_1.1.5
>  [3] varhandle_2.0.5
>  [4] labelled_2.7.0
>  [5] dplyr_1.0.0
>  [6] ggplot2_3.3.2
>  [7] forcats_0.5.0
>  [8] reprex_0.3.0
>  [9] lmerTest_3.1-3
> [10] lme4_1.1-25
> [11] Matrix_1.2-18
> 
> loaded via a namespace (and not attached):
>  [1] Rcpp_1.0.4.6
>  [2] compiler_4.0.2
>  [3] pillar_1.4.4
>  [4] nloptr_1.2.2.1
>  [5] tools_4.0.2
>  [6] digest_0.6.25
>  [7] boot_1.3-25
>  [8] statmod_1.4.34
>  [9] lifecycle_0.2.0
> [10] tibble_3.0.1
> [11] nlme_3.1-148
> [12] gtable_0.3.0
> [13] lattice_0.20-41
> [14] pkgconfig_2.0.3
> [15] rlang_0.4.7
> [16] cli_2.0.2
> [17] rstudioapi_0.11
> [18] haven_2.3.1
> [19] withr_2.2.0
> [20] hms_0.5.3
> [21] generics_0.0.2
> [22] vctrs_0.3.1
> [23] fs_1.4.1
> [24] grid_4.0.2
> [25] tidyselect_1.1.0
> [26] glue_1.4.1
> [27] R6_2.4.1
> [28] fansi_0.4.1
> [29] minqa_1.2.4
> [30] farver_2.0.3
> [31] purrr_0.3.4
> [32] magrittr_1.5
> [33] scales_1.1.1
> [34] ellipsis_0.3.1
> [35] MASS_7.3-51.6
> [36] splines_4.0.2
> [37] insight_0.11.0
> [38] assertthat_0.2.1
> [39] colorspace_1.4-1
> [40] numDeriv_2016.8-1.1
> [41] labeling_0.3
> [42] utf8_1.1.4
> [43] munsell_0.5.0
> [44] crayon_1.3.4
> 
> 
> 
> 
> Sincerely,
> 
> 
> 
> Jad Moawad
> 
> 
> PhD candidate and teaching assistant
> University of Lausanne  - NCCR Lives
> Institut des Sciences Sociales
> Btiment Geopolis - 5621
> 1015 Lausanne
> Switzerland
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list