[R] Large data sets and memory management in R.
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Jan 28 22:18:39 CET 2004
gerald.jean at dgag.ca writes:
> library(package = "statmod", pos = 2,
> lib.loc = "/home/jeg002/R-1.8.1/lib/R/R_LIBS")
>
> qc.B3.tweedie <- glm(formula = pp20B3 ~ ageveh + anpol +
> categveh + champion + cie + dossiera +
> faq13c + faq5a + kmaff + kmprom + nbvt +
> rabprof + sexeprin + newage,
> family = tweedie(var.power = 1.577,
> link.power = 0),
> etastart = log(rep(mean(qc.b3.sans.occ[,
> 'pp20B3']), nrow(qc.b3.sans.occ))),
> weights = unsb3t1,
> trace = T,
> data = qc.b3.sans.occ)
>
> After one iteration (45+ minutes) R is trashing through over 10Gb of
> memory.
>
> Thanks for any insights,
Well, I don't know how much it helps; you are in somewhat uncharted
territory there. I suppose the dataset comes to 0.5-1GB all by itself?
One thing that I note is that you have 60 variables, but use only 15.
Perhaps it helps to remove some of them before the run?
How large does the designmatrix get? If some of those variables have a
lot of levels, it could explain the phenomenon. Any chance that a
continuous variable got recorded as a factor?
-p
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list