[R] use Vectorized function as range of for statement
Zhang Weiwu
zhangweiwu at realss.com
Thu Aug 1 18:38:01 CEST 2013
I guess this has been discussed before, but I don't know the name of this
problem, thus had to ask again.
Consider this scenario:
> fun <- function(x) { print(x)}
> for (i in Vectorize(fun, "x")(1:3)) print("OK")
[1] 1
[1] 2
[1] 3
[1] "OK"
[1] "OK"
[1] "OK"
The optimal behaviour is:
> fun <- function(x) { print(x)}
> for (i in Vectorize(fun, "x")(1:3)) print("OK")
[1] 1
[1] "OK"
[1] 2
[1] "OK"
[1] 3
[1] "OK"
That is, each iteration of vectorized function should yield some result for
the 'for' statement, rather than having all results collected beforehand.
The intention of such a pattern, is to separates the data generation logic
from data processing logic.
The latter mechanism, I think, is more efficient because it doesn't cache
all data before processing -- and the interpreter has the sure knowledge
that caching is not needed, since the vectorized function is not used in
assignment but as a range.
The difference may be trivial, but this pseud code demonstrates otherwise:
readSample <- function(x) {
....
sampling_time <- readBin(con, integer(), 1, size=4)
sample_count <- readBin(con, integer(), 1, size=2)
samples <- readBin(con, float(), sample_count, size=4)
....
matrix # return a big matrix representing a sample
}
for (sample in Vectorize(readSample, "x")(1:10000)) {
# process sample
}
The data file is a few Gigabytes, and caching them is not effortless. Not
having to cache them would make a difference.
This email asks to 1. validate this need of the langauge; 2. alternative
design pattern to workaround it; 3. Ask the proper place to discuss this.
Thanks and best...
More information about the R-help
mailing list