[R] Is my data set too large
Peter Dalgaard
P.Dalgaard at biostat.ku.dk
Tue Dec 12 17:40:13 CET 2006
Aimin Yan wrote:
> I have a data set like this.
> I want to do glm, but I get this error:
>
> Error in model.matrix.default(mt, mf, contrasts) :
> cannot allocate vector of length 932889958
>
> I am wondering if my data set is too large or I did something wrong.
>
> Is there some limitation for data size for R?
>
> thanks,
>
> Aimin
>
>
> > p1982<- read.csv("p_1982_aa.csv")
> > names(p1982)
> [1] "p" "aa" "as" "ms" "cur" "sc"
> > str(p1982)
> 'data.frame': 465979 obs. of 6 variables:
> $ p : Factor w/ 1982 levels "154l_aa","1A0P_aa",..: 1 1 1 1 1 1 1 1 1 1 ...
> $ aa : Factor w/ 19 levels "ALA","ARG","ASN",..: 2 16 4 5 18 3 19 3 2 9 ...
> $ as : num 152.0 15.9 65.1 57.2 28.9 ...
> $ ms : num 108.8 28.3 59.2 49.9 31.8 ...
> $ cur: num -0.1020 0.2564 0.0312 -0.0550 0.0526 ...
> $ sc : num 92.10 103.67 7.27 72.98 96.12 ...
> > attach(p1982)
> > m<-glm(sc~p+aa+as+cur,data=p1982)
> Error in model.matrix.default(mt, mf, contrasts) :
> cannot allocate vector of length 932889958
>
Your "p" is a factor with many levels, so the design matrix for your
model is roughly 500000 x 2000. That gives 1 billion (US) entries of 8
bytes, so you need at least 8 GB just to store the design matrix. So
either you don't want "p" in the model or you have indeed exceeded your
capacity.
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list