[R] use Vectorized function as range of for statement

Zhang Weiwu zhangweiwu at realss.com
Thu Aug 1 18:38:01 CEST 2013


I guess this has been discussed before, but I don't know the name of this 
problem, thus had to ask again.

Consider this scenario:

> fun <- function(x) { print(x)}
> for (i in Vectorize(fun, "x")(1:3)) print("OK")
[1] 1
[1] 2
[1] 3
[1] "OK"
[1] "OK"
[1] "OK"

The optimal behaviour is:

> fun <- function(x) { print(x)}
> for (i in Vectorize(fun, "x")(1:3)) print("OK")
[1] 1
[1] "OK"
[1] 2
[1] "OK"
[1] 3
[1] "OK"

That is, each iteration of vectorized function should yield some result for 
the 'for' statement, rather than having all results collected beforehand.

The intention of such a pattern, is to separates the data generation logic 
from data processing logic.

The latter mechanism, I think, is more efficient because it doesn't cache 
all data before processing -- and the interpreter has the sure knowledge 
that caching is not needed, since the vectorized function is not used in 
assignment but as a range.

The difference may be trivial, but this pseud code demonstrates otherwise:

readSample <- function(x) {
 	....
 	sampling_time <- readBin(con, integer(), 1, size=4)
 	sample_count <- readBin(con, integer(), 1, size=2)
 	samples <- readBin(con, float(), sample_count, size=4)
 	....
 	matrix # return a big matrix representing a sample
}

for (sample in Vectorize(readSample, "x")(1:10000)) {
 	# process sample
}

The data file is a few Gigabytes, and caching them is not effortless. Not 
having to cache them would make a difference.

This email asks to 1. validate this need of the langauge; 2. alternative 
design pattern to workaround it; 3. Ask the proper place to discuss this.

Thanks and best...



More information about the R-help mailing list