[R-sig-ME] Question regarding large data.frame in LMER?

Jad Moawad j@d@mo@w@d @end|ng |rom un||@ch
Thu Dec 10 13:11:52 CET 2020


I am working with a large data.frame that contains around 1.4 million observations. Initially when i was running my models, i was working on a sub-sample (10% of my full-sample). This is because running one model can take a lot of time using the original data. Once i was sure that all variables are well harmonized and all regressions were running fine, i ran my models using the full sample. However, the regression did not converge and i received the following two errors from two different models:

Error in fun(xaa, ...) : Downdated VtV is not positive definite

Error in fun(xss, ...) : Downdated VtV is not positive definite

I use the lmer function to fit my model and i include a random slopes at the country and country_year level. Below you find the code that i use.

Model1 <- lmer(health~ class + age + I(age^2)  + class*macro_unemployment +
               (class + age + I(age^2)|country) +
               (class+ age + I(age^2) |country_year) +
               (1|id), data=df)

Model2 <- lmer(health~ education + age + I(age^2)  + education*macro_unemployment+
               (education + age + I(age^2)|country) +
               (education + age + I(age^2) |country_year) +
               (1|id), data=df)


Could someone help me please with solving this issue?

Below you find a glimpse (str) of my data and my sessionInfo():

tibble [1,370,264 � 8] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ health            : num [1:1370264] 100 100 50 100 0 75 75 100 100 50 ...
 $ class             : Factor w/ 3 levels "Upper-middle class",..: 3 3 NA 3 3 3 3 1 1 3 ...
 $ education         : Factor w/ 3 levels "low","mid","high": 1 1 1 1 1 1 2 3 3 1 ...
 $ age               : num [1:1370264] 24 25 24 25 42 43 34 34 35 58 ...
 $ macro_unemployment: num [1:1370264] 5.24 4.86 5.24 4.86 5.24 ...
 $ id                : int [1:1370264] 2 2 3 3 4 4 6 7 7 8 ...
 $ country_year      : int [1:1370264] 1 2 1 2 1 2 1 1 2 1 ...
 $ country           : Factor w/ 30 levels "Austria","Belgium",..: 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "groups")= tibble [27 � 2] (S3: tbl_df/tbl/data.frame)
  ..$ country: Factor w/ 30 levels "Austria","Belgium",..: 1 2 3 6 7 8 9 10 11 12 ...
  ..$ .rows  : list<int> [1:27]
  .. ..$ : int [1:47204] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ : int [1:41361] 47205 47206 47207 47208 47209 47210 47211 47212 47213 47214 ...
  .. ..$ : int [1:42407] 88566 88567 88568 88569 88570 88571 88572 88573 88574 88575 ...
  .. ..$ : int [1:48253] 130973 130974 130975 130976 130977 130978 130979 130980 130981 130982 ...
  .. ..$ : int [1:31917] 179226 179227 179228 179229 179230 179231 179232 179233 179234 179235 ...
  .. ..$ : int [1:44047] 211143 211144 211145 211146 211147 211148 211149 211150 211151 211152 ...
  .. ..$ : int [1:62087] 255190 255191 255192 255193 255194 255195 255196 255197 255198 255199 ...
  .. ..$ : int [1:94309] 317277 317278 317279 317280 317281 317282 317283 317284 317285 317286 ...
  .. ..$ : int [1:37246] 411586 411587 411588 411589 411590 411591 411592 411593 411594 411595 ...
  .. ..$ : int [1:77253] 448832 448833 448834 448835 448836 448837 448838 448839 448840 448841 ...
  .. ..$ : int [1:16823] 526085 526086 526087 526088 526089 526090 526091 526092 526093 526094 ...
  .. ..$ : int [1:24687] 542908 542909 542910 542911 542912 542913 542914 542915 542916 542917 ...
  .. ..$ : int [1:116263] 567595 567596 567597 567598 567599 567600 567601 567602 567603 567604 ...
  .. ..$ : int [1:43218] 683858 683859 683860 683861 683862 683863 683864 683865 683866 683867 ...
  .. ..$ : int [1:28709] 727076 727077 727078 727079 727080 727081 727082 727083 727084 727085 ...
  .. ..$ : int [1:27583] 755785 755786 755787 755788 755789 755790 755791 755792 755793 755794 ...
  .. ..$ : int [1:77960] 783368 783369 783370 783371 783372 783373 783374 783375 783376 783377 ...
  .. ..$ : int [1:36922] 861328 861329 861330 861331 861332 861333 861334 861335 861336 861337 ...
  .. ..$ : int [1:93194] 898250 898251 898252 898253 898254 898255 898256 898257 898258 898259 ...
  .. ..$ : int [1:9004] 991444 991445 991446 991447 991448 991449 991450 991451 991452 991453 ...
  .. ..$ : int [1:40074] 1000448 1000449 1000450 1000451 1000452 1000453 1000454 1000455 1000456 1000457 ...
  .. ..$ : int [1:29342] 1040522 1040523 1040524 1040525 1040526 1040527 1040528 1040529 1040530 1040531 ...
  .. ..$ : int [1:85124] 1069864 1069865 1069866 1069867 1069868 1069869 1069870 1069871 1069872 1069873 ...
  .. ..$ : int [1:92350] 1154988 1154989 1154990 1154991 1154992 1154993 1154994 1154995 1154996 1154997 ...
  .. ..$ : int [1:50188] 1247338 1247339 1247340 1247341 1247342 1247343 1247344 1247345 1247346 1247347 ...
  .. ..$ : int [1:7598] 1297526 1297527 1297528 1297529 1297530 1297531 1297532 1297533 1297534 1297535 ...
  .. ..$ : int [1:65141] 1305124 1305125 1305126 1305127 1305128 1305129 1305130 1305131 1305132 1305133 ...
  .. ..@ ptype: int(0)
  ..- attr(*, ".drop")= logi TRUE
>


Session Info:

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices
[4] utils     datasets  methods
[7] base

other attached packages:
 [1] sessioninfo_1.1.1
 [2] sjlabelled_1.1.5
 [3] varhandle_2.0.5
 [4] labelled_2.7.0
 [5] dplyr_1.0.0
 [6] ggplot2_3.3.2
 [7] forcats_0.5.0
 [8] reprex_0.3.0
 [9] lmerTest_3.1-3
[10] lme4_1.1-25
[11] Matrix_1.2-18

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6
 [2] compiler_4.0.2
 [3] pillar_1.4.4
 [4] nloptr_1.2.2.1
 [5] tools_4.0.2
 [6] digest_0.6.25
 [7] boot_1.3-25
 [8] statmod_1.4.34
 [9] lifecycle_0.2.0
[10] tibble_3.0.1
[11] nlme_3.1-148
[12] gtable_0.3.0
[13] lattice_0.20-41
[14] pkgconfig_2.0.3
[15] rlang_0.4.7
[16] cli_2.0.2
[17] rstudioapi_0.11
[18] haven_2.3.1
[19] withr_2.2.0
[20] hms_0.5.3
[21] generics_0.0.2
[22] vctrs_0.3.1
[23] fs_1.4.1
[24] grid_4.0.2
[25] tidyselect_1.1.0
[26] glue_1.4.1
[27] R6_2.4.1
[28] fansi_0.4.1
[29] minqa_1.2.4
[30] farver_2.0.3
[31] purrr_0.3.4
[32] magrittr_1.5
[33] scales_1.1.1
[34] ellipsis_0.3.1
[35] MASS_7.3-51.6
[36] splines_4.0.2
[37] insight_0.11.0
[38] assertthat_0.2.1
[39] colorspace_1.4-1
[40] numDeriv_2016.8-1.1
[41] labeling_0.3
[42] utf8_1.1.4
[43] munsell_0.5.0
[44] crayon_1.3.4




Sincerely,



Jad Moawad


PhD candidate and teaching assistant
University of Lausanne  - NCCR Lives
Institut des Sciences Sociales
B�timent Geopolis - 5621
1015 Lausanne
Switzerland


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list