[R-sig-ME] large dataset - lmer2 vector size specified is too large

Douglas Bates bates at stat.wisc.edu
Fri Mar 2 15:36:05 CET 2007


On 3/1/07, florian bw <florian.bw at gmail.com> wrote:
> Hi,
>
> I want to fit mRNA expression data to sex.
>
> I have the following values:
>   expr:     expression value (for gene/person)
>   affyID:   gene ID
>   cephID: person ID
>   sex
>
> with 224 genes and 195 persons, therefore 43,680 data points. Both
> with the nlme and the lme4 package i get errors. I tried it with R 2.4
> and 2.5, and the newest package versions.
>
> I have a 64-machine with 8GB RAM. Is the dataset simply too large? I
> already cut it down and would actually be glad if I could do the
> calculation with ~ 8000x250 data points.
>
> Thank you for your help.
>
> Florian Breitwieser
> UNSW Sydney
> Systems Biolgy
>
> ---------------------------------------------------
>
> > sessionInfo()
> R version 2.5.0 Under development (unstable) (2007-02-26 r40806)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
> [7] "base"
>
> other attached packages:
>        lme4      Matrix     lattice        nlme
> "0.9975-13" "0.9975-11"   "0.14-16"    "3.1-79"
>
>
> -----------------------------------------------------
>
> > library(lme4)
> > sex.lme <- lmer2(expr ~ affyID*sex + affyID|cephID,data=ds.n)
> Error in vector("double", length) : vector size specified is too large
>

Do you know that this formula is equivalent to

expr ~ 1 + (affyID*sex|cephID)

I think you meant

expr ~ affyID * sex + (affyID|cephID)

but even that formula means that you are estimating 448 fixed effects
for which the model matrix is of size 448 * 43680 * 8 bytes (about 150
MB).  In addition you are attempting to estimate

(224 * (224 + 1))/2 = 25200

variance-covariance components from 43680 observations.

I suggest that you reconsider the model specification.  The readers of
this list will be able to help with the interpretation of the model
specification if you want to discuss it.

>
> ------------------------------------------------------
>
> > library(nlme)
> > sex.lme <- lme(expr ~ affyID*sex,random=~affyID|cephID,data=ds.n)
> Error: cannot allocate vector of size 4.7 Gb
> > gc()
>           used (Mb) gc trigger   (Mb)   max used   (Mb)
> Ncells  829281 44.3    5714627  305.2   10890793  581.7
> Vcells 1881820 14.4 1034019068 7889.0 1103035467 8415.5
>
>
> Another time I got the following message:
> > sex.lme <- lme(expr ~ affyID*sex,random=~affyID|cephID,data=ds.n)
>
>  *** caught segfault ***
> address (nil), cause 'unknown'
>
> Traceback:
>  1: lme.formula(expr ~ affyID * sex, random = ~affyID | cephID, data = ds.n)
>  2: lme(expr ~ affyID * sex, random = ~affyID | cephID, data = ds.n)
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>




More information about the R-sig-mixed-models mailing list