[R-sig-ME] Data frame size limits in MCMCglmm?

Sat Jan 19 05:28:33 CET 2013

Hi Stuart,

How many (if any) iterations completed before you got the seg fault?
Also, how much memory does your system have?

Currently at 4000 iterations, I have not reproduced the error so far
with this made up example (although I have practically finished an
algorithm to sort any array in O(log log n) while waiting to get this
far ~1 hour per thousand iterations on a 6 core 3.9GHZ machine).

require(MCMCglmm)
require(MASS)
set.seed(10)
rint <- rep(rnorm(14982, 0, 4), each = floor(2.44e6/14982))
ID <- factor(rep(1:14982,  each = floor(2.44e6/14982)))
X <- MASS::mvrnorm(length(rint), mu = c(0, 0, 0),
  Sigma = matrix(c(1, .3, .3, .3, 1, .3, .3, .3, 1), 3))
b <- matrix(c(1.2, -.5, 2))
ycont <- rnorm(length(ID), mean = 2 + rint + X %*% b)
yord <- cut(ycont, breaks = quantile(ycont, c(0, .2, .4, .6, .8, 1)),
  include.lowest=TRUE, ordered_result=TRUE)
testdat <- data.frame(yord, ID, x1 = X[,1], x2 = X[, 2], x3 = X[, 3])
## > gc()
##            used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells  1117490  59.7    1710298  91.4  1368491  73.1
## Vcells 25878798 197.5   47836772 365.0 47753473 364.4
m <- MCMCglmm(yord ~ x1 + x2 + x3, random = ~ ID,
  family = "ordinal", data = testdat,
  prior = list(
    R = list(V = 1, fix = 1),
    G = list(
      G1 = list(V = 1, nu = 0))),
  nitt = 13000, thin = 10, burnin = 3000)

on a Win 8 pro x64 system with 32GB of memory and

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] MASS_7.3-22        ggplot2_0.9.3      MCMCglmm_2.17
 [4] corpcor_1.6.4      ape_3.0-6          coda_0.16-1        Matrix_1.0-10
 [8] lattice_0.20-10    tensorA_0.36

On Fri, Jan 18, 2013 at 1:55 PM, Stuart Luppescu <slu at ccsr.uchicago.edu> wrote:
> Hello, I'm having problems running a simple ordinal outcome mixed
> effects model, and I'm thinking it may be because of the size of the
> dataset (or, it very may well be that I'm not specifying the model
> correctly). I must confess to insecurity about how to specify the
> priors. Here is the structure of the data frame (with columns not in
> this model omitted). Note that there are more than 2.4 million rows. Is
> that a problem?
>
>  str(all.subj)
> 'data.frame':   2438922 obs. of  112 variables:
>  $ gr10                  : num  0 0 0 0 0 0 1 0 1 0 ...
>  $ gr11                  : num  0 0 0 1 1 1 0 0 0 0 ...
>  $ gr12                  : num  1 1 1 0 0 0 0 1 0 0 ...
>  $ tid                   : Factor w/ 14982 levels "........","A.D46607",..: 2 2 2 2 2 2 2 2 2 2 ...
>  $ final.points          : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 4 4 4 2 3 3 2 2 3 2 ...
>
> Here are two attempts and their results:
>
> glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
>                          prior=list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=0))),
>                          random = ~tid ,
>                          family = "ordinal",
>                          nitt=100000,
>                          data = all.subj)
>
> Error: segfault from C stack overflow
>
>
>  glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
>                          prior=list(R=list(V=1, nu=0), G=list(G1=list(V=1, nu=0))),
>                          random = ~tid ,
>                          family = "ordinal",
>                          nitt=100000,
>                          data = all.subj)
>
>
> Process R segmentation fault (core dumped) at Fri Jan 18 12:53:49 2013
>
> Here is my sessionInfo
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] foreign_0.8-52  MCMCglmm_2.17   corpcor_1.6.4   ape_3.0-6
> [5] coda_0.16-1     Matrix_1.0-10   lattice_0.20-13 tensorA_0.36
>
> loaded via a namespace (and not attached):
> [1] compiler_2.15.1 gee_4.13-18     grid_2.15.1     nlme_3.1-107
> [5] tools_2.15.1
>
> and memory info.
>
>  gc()
>             used   (Mb) gc trigger    (Mb)   max used    (Mb)
> Ncells   1196029   63.9    1835812    98.1    1710298    91.4
> Vcells 678385919 5175.7 1777423119 13560.7 2114537169 16132.7
>
> Any help will be appreciated.
>
>
> --
> Stuart Luppescu <slu at ccsr.uchicago.edu>
> University of Chicago
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/