[R] biglm: how it handles large data set?
noclue_
tim.liu at netzero.net
Sun Oct 31 08:22:12 CET 2010
I am trying to figure out why 'biglm' can handle large data set...
According to the R document - "biglm creates a linear model object that uses
only p^2 memory for p variables. It can be updated with more data using
update. This allows linear regression on data sets larger than memory."
After reading the source code below, I still could not figure out how
'update' implements the algorithm...
Thanks for any light shed upon this ...
> biglm::biglm
function (formula, data, weights = NULL, sandwich = FALSE)
{
tt <- terms(formula)
if (!is.null(weights)) {
if (!inherits(weights, "formula"))
stop("`weights' must be a formula")
w <- model.frame(weights, data)[[1]]
}
else w <- NULL
mf <- model.frame(tt, data)
mm <- model.matrix(tt, mf)
qr <- bigqr.init(NCOL(mm))
qr <- update(qr, mm, model.response(mf), w)
rval <- list(call = sys.call(), qr = qr, assign = attr(mm,
"assign"), terms = tt, n = NROW(mm), names = colnames(mm),
weights = weights)
if (sandwich) {
p <- ncol(mm)
n <- nrow(mm)
xyqr <- bigqr.init(p * (p + 1))
xx <- matrix(nrow = n, ncol = p * (p + 1))
xx[, 1:p] <- mm * model.response(mf)
for (i in 1:p) xx[, p * i + (1:p)] <- mm * mm[, i]
xyqr <- update(xyqr, xx, rep(0, n), w * w)
rval$sandwich <- list(xy = xyqr)
}
rval$df.resid <- rval$n - length(qr$D)
class(rval) <- "biglm"
rval
}
<environment: namespace:biglm>
---------------------------
--
View this message in context: http://r.789695.n4.nabble.com/biglm-how-it-handles-large-data-set-tp3020890p3020890.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list