[R] use Vectorized function as range of for statement

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Thu Aug 1 20:04:56 CEST 2013


I think this is on topic here, but a reproducible example is highly desirable if not required for clarity.

The Vectorize function is essentially a wrapped up for loop, so you are really executing two successive for loops. Note that the Vectorize function is not itself vectorised, so there is no particular advantage to using it in this way. You might as well call fun as a statement in the for loop.

However, interleaving output and computation is quite inefficient, so it it strongly recommended to handle output in its own loop or function in most cases. This allows true vectorization to be applied to the computation phase.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Zhang Weiwu <zhangweiwu at realss.com> wrote:
>
>I guess this has been discussed before, but I don't know the name of
>this 
>problem, thus had to ask again.
>
>Consider this scenario:
>
>> fun <- function(x) { print(x)}
>> for (i in Vectorize(fun, "x")(1:3)) print("OK")
>[1] 1
>[1] 2
>[1] 3
>[1] "OK"
>[1] "OK"
>[1] "OK"
>
>The optimal behaviour is:
>
>> fun <- function(x) { print(x)}
>> for (i in Vectorize(fun, "x")(1:3)) print("OK")
>[1] 1
>[1] "OK"
>[1] 2
>[1] "OK"
>[1] 3
>[1] "OK"
>
>That is, each iteration of vectorized function should yield some result
>for 
>the 'for' statement, rather than having all results collected
>beforehand.
>
>The intention of such a pattern, is to separates the data generation
>logic 
>from data processing logic.
>
>The latter mechanism, I think, is more efficient because it doesn't
>cache 
>all data before processing -- and the interpreter has the sure
>knowledge 
>that caching is not needed, since the vectorized function is not used
>in 
>assignment but as a range.
>
>The difference may be trivial, but this pseud code demonstrates
>otherwise:
>
>readSample <- function(x) {
> 	....
> 	sampling_time <- readBin(con, integer(), 1, size=4)
> 	sample_count <- readBin(con, integer(), 1, size=2)
> 	samples <- readBin(con, float(), sample_count, size=4)
> 	....
> 	matrix # return a big matrix representing a sample
>}
>
>for (sample in Vectorize(readSample, "x")(1:10000)) {
> 	# process sample
>}
>
>The data file is a few Gigabytes, and caching them is not effortless.
>Not 
>having to cache them would make a difference.
>
>This email asks to 1. validate this need of the langauge; 2.
>alternative 
>design pattern to workaround it; 3. Ask the proper place to discuss
>this.
>
>Thanks and best...
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list