[R-sig-ME] Data frame size limits in MCMCglmm?
Jarrod Hadfield
j.hadfield at ed.ac.uk
Fri Jan 25 11:36:05 CET 2013
Hi Stuart,
2.4 million records is bigger than anything I've tried but in theory
it should run, or return an error if it can't allocate enough memory.
It definitely shouldn't be seg-faulting. If you could send a
reproducible example (preferably one where it fails quickly) I will
take a look into it.
Cheers,
Jarrod.
Quoting Joshua Wiley <jwiley.psych at gmail.com> on Fri, 18 Jan 2013
20:28:33 -0800:
> Hi Stuart,
>
> How many (if any) iterations completed before you got the seg fault?
> Also, how much memory does your system have?
>
> Currently at 4000 iterations, I have not reproduced the error so far
> with this made up example (although I have practically finished an
> algorithm to sort any array in O(log log n) while waiting to get this
> far ~1 hour per thousand iterations on a 6 core 3.9GHZ machine).
>
> require(MCMCglmm)
> require(MASS)
> set.seed(10)
> rint <- rep(rnorm(14982, 0, 4), each = floor(2.44e6/14982))
> ID <- factor(rep(1:14982, each = floor(2.44e6/14982)))
> X <- MASS::mvrnorm(length(rint), mu = c(0, 0, 0),
> Sigma = matrix(c(1, .3, .3, .3, 1, .3, .3, .3, 1), 3))
> b <- matrix(c(1.2, -.5, 2))
> ycont <- rnorm(length(ID), mean = 2 + rint + X %*% b)
> yord <- cut(ycont, breaks = quantile(ycont, c(0, .2, .4, .6, .8, 1)),
> include.lowest=TRUE, ordered_result=TRUE)
> testdat <- data.frame(yord, ID, x1 = X[,1], x2 = X[, 2], x3 = X[, 3])
> ## > gc()
> ## used (Mb) gc trigger (Mb) max used (Mb)
> ## Ncells 1117490 59.7 1710298 91.4 1368491 73.1
> ## Vcells 25878798 197.5 47836772 365.0 47753473 364.4
> m <- MCMCglmm(yord ~ x1 + x2 + x3, random = ~ ID,
> family = "ordinal", data = testdat,
> prior = list(
> R = list(V = 1, fix = 1),
> G = list(
> G1 = list(V = 1, nu = 0))),
> nitt = 13000, thin = 10, burnin = 3000)
>
> on a Win 8 pro x64 system with 32GB of memory and
>
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] MASS_7.3-22 ggplot2_0.9.3 MCMCglmm_2.17
> [4] corpcor_1.6.4 ape_3.0-6 coda_0.16-1 Matrix_1.0-10
> [8] lattice_0.20-10 tensorA_0.36
>
>
>
>
> On Fri, Jan 18, 2013 at 1:55 PM, Stuart Luppescu
> <slu at ccsr.uchicago.edu> wrote:
>> Hello, I'm having problems running a simple ordinal outcome mixed
>> effects model, and I'm thinking it may be because of the size of the
>> dataset (or, it very may well be that I'm not specifying the model
>> correctly). I must confess to insecurity about how to specify the
>> priors. Here is the structure of the data frame (with columns not in
>> this model omitted). Note that there are more than 2.4 million rows. Is
>> that a problem?
>>
>> str(all.subj)
>> 'data.frame': 2438922 obs. of 112 variables:
>> $ gr10 : num 0 0 0 0 0 0 1 0 1 0 ...
>> $ gr11 : num 0 0 0 1 1 1 0 0 0 0 ...
>> $ gr12 : num 1 1 1 0 0 0 0 1 0 0 ...
>> $ tid : Factor w/ 14982 levels
>> "........","A.D46607",..: 2 2 2 2 2 2 2 2 2 2 ...
>> $ final.points : Ord.factor w/ 5 levels
>> "0"<"1"<"2"<"3"<..: 4 4 4 2 3 3 2 2 3 2 ...
>>
>> Here are two attempts and their results:
>>
>> glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
>> prior=list(R=list(V=1, fix=1),
>> G=list(G1=list(V=1, nu=0))),
>> random = ~tid ,
>> family = "ordinal",
>> nitt=100000,
>> data = all.subj)
>>
>> Error: segfault from C stack overflow
>>
>>
>> glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
>> prior=list(R=list(V=1, nu=0),
>> G=list(G1=list(V=1, nu=0))),
>> random = ~tid ,
>> family = "ordinal",
>> nitt=100000,
>> data = all.subj)
>>
>>
>> Process R segmentation fault (core dumped) at Fri Jan 18 12:53:49 2013
>>
>> Here is my sessionInfo
>> R version 2.15.1 (2012-06-22)
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] foreign_0.8-52 MCMCglmm_2.17 corpcor_1.6.4 ape_3.0-6
>> [5] coda_0.16-1 Matrix_1.0-10 lattice_0.20-13 tensorA_0.36
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_2.15.1 gee_4.13-18 grid_2.15.1 nlme_3.1-107
>> [5] tools_2.15.1
>>
>> and memory info.
>>
>> gc()
>> used (Mb) gc trigger (Mb) max used (Mb)
>> Ncells 1196029 63.9 1835812 98.1 1710298 91.4
>> Vcells 678385919 5175.7 1777423119 13560.7 2114537169 16132.7
>>
>> Any help will be appreciated.
>>
>>
>> --
>> Stuart Luppescu <slu at ccsr.uchicago.edu>
>> University of Chicago
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list