[R-sig-ME] Data frame size limits in MCMCglmm?
Joshua Wiley
jwiley.psych at gmail.com
Sat Jan 19 05:28:33 CET 2013
Hi Stuart,
How many (if any) iterations completed before you got the seg fault?
Also, how much memory does your system have?
Currently at 4000 iterations, I have not reproduced the error so far
with this made up example (although I have practically finished an
algorithm to sort any array in O(log log n) while waiting to get this
far ~1 hour per thousand iterations on a 6 core 3.9GHZ machine).
require(MCMCglmm)
require(MASS)
set.seed(10)
rint <- rep(rnorm(14982, 0, 4), each = floor(2.44e6/14982))
ID <- factor(rep(1:14982, each = floor(2.44e6/14982)))
X <- MASS::mvrnorm(length(rint), mu = c(0, 0, 0),
Sigma = matrix(c(1, .3, .3, .3, 1, .3, .3, .3, 1), 3))
b <- matrix(c(1.2, -.5, 2))
ycont <- rnorm(length(ID), mean = 2 + rint + X %*% b)
yord <- cut(ycont, breaks = quantile(ycont, c(0, .2, .4, .6, .8, 1)),
include.lowest=TRUE, ordered_result=TRUE)
testdat <- data.frame(yord, ID, x1 = X[,1], x2 = X[, 2], x3 = X[, 3])
## > gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 1117490 59.7 1710298 91.4 1368491 73.1
## Vcells 25878798 197.5 47836772 365.0 47753473 364.4
m <- MCMCglmm(yord ~ x1 + x2 + x3, random = ~ ID,
family = "ordinal", data = testdat,
prior = list(
R = list(V = 1, fix = 1),
G = list(
G1 = list(V = 1, nu = 0))),
nitt = 13000, thin = 10, burnin = 3000)
on a Win 8 pro x64 system with 32GB of memory and
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] MASS_7.3-22 ggplot2_0.9.3 MCMCglmm_2.17
[4] corpcor_1.6.4 ape_3.0-6 coda_0.16-1 Matrix_1.0-10
[8] lattice_0.20-10 tensorA_0.36
On Fri, Jan 18, 2013 at 1:55 PM, Stuart Luppescu <slu at ccsr.uchicago.edu> wrote:
> Hello, I'm having problems running a simple ordinal outcome mixed
> effects model, and I'm thinking it may be because of the size of the
> dataset (or, it very may well be that I'm not specifying the model
> correctly). I must confess to insecurity about how to specify the
> priors. Here is the structure of the data frame (with columns not in
> this model omitted). Note that there are more than 2.4 million rows. Is
> that a problem?
>
> str(all.subj)
> 'data.frame': 2438922 obs. of 112 variables:
> $ gr10 : num 0 0 0 0 0 0 1 0 1 0 ...
> $ gr11 : num 0 0 0 1 1 1 0 0 0 0 ...
> $ gr12 : num 1 1 1 0 0 0 0 1 0 0 ...
> $ tid : Factor w/ 14982 levels "........","A.D46607",..: 2 2 2 2 2 2 2 2 2 2 ...
> $ final.points : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 4 4 4 2 3 3 2 2 3 2 ...
>
> Here are two attempts and their results:
>
> glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
> prior=list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=0))),
> random = ~tid ,
> family = "ordinal",
> nitt=100000,
> data = all.subj)
>
> Error: segfault from C stack overflow
>
>
> glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
> prior=list(R=list(V=1, nu=0), G=list(G1=list(V=1, nu=0))),
> random = ~tid ,
> family = "ordinal",
> nitt=100000,
> data = all.subj)
>
>
> Process R segmentation fault (core dumped) at Fri Jan 18 12:53:49 2013
>
> Here is my sessionInfo
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] foreign_0.8-52 MCMCglmm_2.17 corpcor_1.6.4 ape_3.0-6
> [5] coda_0.16-1 Matrix_1.0-10 lattice_0.20-13 tensorA_0.36
>
> loaded via a namespace (and not attached):
> [1] compiler_2.15.1 gee_4.13-18 grid_2.15.1 nlme_3.1-107
> [5] tools_2.15.1
>
> and memory info.
>
> gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1196029 63.9 1835812 98.1 1710298 91.4
> Vcells 678385919 5175.7 1777423119 13560.7 2114537169 16132.7
>
> Any help will be appreciated.
>
>
> --
> Stuart Luppescu <slu at ccsr.uchicago.edu>
> University of Chicago
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
More information about the R-sig-mixed-models
mailing list