[R] looping over lapply calls
Ramon Diaz-Uriarte
rdiazuri at students.wisc.edu
Sun Jun 4 08:40:13 CEST 2000
Dear Prof. Ripley,
Thank you very much for your detailed answer!!
> 3) In S engines the reasons are related to minimizing the number of pieces
> of memory in use as well as the size. Using blocks allows the memory to be
> cleaned up at the end of each block. R does not have delayed commitment to
> the same extent, and garbage collects memory only when full (or asked).
> So at least one issue is how much is in use when memory management occurs,
> and both the size and number of R objects is relevant. As memory management
> is about to change, I think the only way out is to experiment, and that
> includes with the setting of the heap size in R.
Understood.
> (If you had had some spaces around operators and indented consistently
> I might have been able to comment. Set options(keep.source=FALSE) and
> read this into R and out again to get a consistent layout. You can
> then tidy it up in ESS.)
>
>
I hope it is not too late; I have sourced it (after fixing a couple of things),
then copied the printout into an xemacs
window, and added a few comments, and have changed the settings
of my email so that it does not break lines.
lmx.0 <- function (formula, data, max.num = 0, lapply.size = 100)
{
if (max.num) {
max.num <- min(max(data$sim.counter), max.num)
data <- data[data$sim.counter < max.num + 1, ]
}
else max.num <- max(data$sim.counter)
names.vars <- drop.scope(formula)
terms.in.model <- names(lm(formula = formula, data = data,
subset = data$sim.counter == 0)[[1]]) # there should be a simpler way
loop.counter <- (max.num + 1)%/%lapply.size
rest.of.data <- (max.num + 1)%%lapply.size
tmp <- matrix(nrow = max.num + 1, ncol = length(names.vars) +
length(terms.in.model))
i <- 0
if (loop.counter) { # enter here if number to deal with is larger than lapply.size
for (i in 1:loop.counter) {
datai <- data[data$sim.counter <= ((i * lapply.size) -
1) & data$sim.counter >= ((i - 1) * lapply.size), ]
# obtain output of interest ---fm[[1]],my.drop(fm)--- by applying that function
# over the subset of data within loop; as result is list, unlist and turn into a matrix
tmp[(((i - 1) * lapply.size) + 1):(i * lapply.size),
] <- matrix(unlist(lapply(split(datai, datai$sim.counter),
function(datos, formula) {
fm <- lm(formula = formula, data = datos)
c(fm[[1]], my.drop(fm))
}, formula = formula)), nrow = lapply.size, byrow = T)
}
}
if (rest.of.data) { #enter here if number to analyze < lapply.size or a non-interger multiple
datai <- data[data$sim.counter >= (loop.counter * lapply.size),
]
tmp[(((i * lapply.size) + 1):(max.num + 1)), ] <- matrix(unlist(lapply(split(datai,
datai$sim.counter), function(datos, formula) {
fm <- lm(formula = formula, data = datos)
c(fm[[1]], my.drop(fm))
}, formula = formula)), nrow = rest.of.data, byrow = T)
}
tmp
}
my.drop <- function (object)
{
# This is from drop1 (by BDR) after eliminating things I didn't use;
# this function is severely crippled, so use only here.
# Components are NOT named.
# Maybe I should only define it within lmx.0 to prevent accidental misuse?
x <- model.matrix(object)
iswt <- !is.null(wt <- object$weights)
n <- nrow(x)
asgn <- attr(x, "assign")
tl <- attr(object$terms, "term.labels")
scope <- drop.scope(object)
ndrop <- match(scope, tl)
ns <- length(scope)
rdf <- object$df.resid
chisq <- deviance.lm(object)
dfs <- numeric(ns)
RSS <- numeric(ns)
y <- object$residuals + predict(object)
rank <- object$rank
for (i in 1:ns) {
ii <- seq(along = asgn)[asgn == ndrop[i]]
jj <- setdiff(seq(ncol(x)), ii)
z <- if (iswt)
lm.wfit(x[, jj, drop = FALSE], y, wt)
else lm.fit(x[, jj, drop = FALSE], y)
dfs[i] <- z$rank
RSS[i] <- deviance.lm(z)
}
dfs <- c(object$rank, dfs)
RSS <- c(chisq, RSS)
dfs <- (dfs[1] - dfs)[-1]
rdf <- object$df.resid
dev <- RSS[-1] - RSS[1]
rms <- RSS[1]/rdf
Fs <- (dev/dfs)/rms
Fs
}
#an example
data.5010 <- data.frame(sim.counter=rep(seq(from=0,to=500),rep(10,501)),y=rnorm(5010),x=rnorm(5010))
ex1 <- lmx.0(y~x,data=data.5010)
--
Ramón Díaz-Uriarte
Dept. Zoology and Statistics
University of Wisconsin-Madison
Madison, WI 53706-1381
email: rdiazuri at students.wisc.edu
(NOTE: starting 15-July-2000 new email:
ramon-diaz at teleline.es)
phone: 608-238-8041
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list