[R] Multiple if function
Charles C. Berry
ccberry at ucsd.edu
Thu Sep 17 18:29:38 CEST 2015
On Thu, 17 Sep 2015, Berend Hasselman wrote:
>
>> On 17 Sep 2015, at 01:42, Dénes Tóth <toth.denes at ttk.mta.hu> wrote:
>>
>>
>>
>> On 09/16/2015 04:41 PM, Bert Gunter wrote:
>>> Yes! Chuck's use of mapply is exactly the split/combine strategy I was
>>> looking for. In retrospect, exactly how one should think about it.
>>> Many thanks to all for a constructive discussion .
>>>
>>> -- Bert
>>>
>>>
>>> Bert Gunter
>>>
>>>>>>
>>>>>> Use mapply like this on large problems:
>>>>>>
>>>>>> unsplit(
>>>>>> mapply(
>>>>>> function(x,z) eval( x, list( y=z )),
>>>>>> expression( A=y*2, B=y+3, C=sqrt(y) ),
>>>>>> split( dat$Flow, dat$ASB ),
>>>>>> SIMPLIFY=FALSE),
>>>>>> dat$ASB)
>>>>>>
>>>>>> Chuck
>>>>>>
>>
>>
>> Is there any reason not to use data.table for this purpose, especially if efficiency is of concern?
>>
>> ---
>>
>> # load data.table and microbenchmark
>> library(data.table)
>> library(microbenchmark)
>> #
>> # prepare data
>> DF <- data.frame(
>> ASB = rep_len(factor(LETTERS[1:3]), 3e5),
>> Flow = rnorm(3e5)^2)
>> DT <- as.data.table(DF)
>> DT[, ASB := as.character(ASB)]
>> #
>> # define functions
>> #
>> # Chuck's version
>> fnSplit <- function(dat) {
>> unsplit(
>> mapply(
>> function(x,z) eval( x, list( y=z )),
>> expression( A=y*2, B=y+3, C=sqrt(y) ),
>> split( dat$Flow, dat$ASB ),
>> SIMPLIFY=FALSE),
>> dat$ASB)
>> }
>> #
>> # data.table-way (IMHO, much easier to read)
>> fnDataTable <- function(dat) {
>> dat[,
>> result :=
>> if (.BY == "A") {
>> 2 * Flow
>> } else if (.BY == "B") {
>> 3 + Flow
>> } else if (.BY == "C") {
>> sqrt(Flow)
>> },
>> by = ASB]
>> }
>> #
>> # benchmark
>> #
>> microbenchmark(fnSplit(DF), fnDataTable(DT))
>> identical(fnSplit(DF), fnDataTable(DT)[, result])
>>
>> ---
>>
>> Actually, in Chuck's version the unsplit() part is slow. If the order is not of concern (e.g., DF is reordered before calling fnSplit), fnSplit is comparable to the DT-version.
>>
>
> But David’s version is faster than Chuck’s fnSplit. I modified David’s solution slightly to get a result that is identical to fnSplit.
>
> # David's version
> # my modification to return a vector just like fnSplit
> fnDavid <- function(dat) {
> z <- mapply(
> function(x,z) eval( x, list( y=z )),
> expression(A= y*2, B=y+3, C=sqrt(y) ),
> split( dat$Flow, dat$ASB ),
> USE.NAMES=FALSE, SIMPLIFY=TRUE
> )
> as.vector(t(z))
> }
>
> Added this to Dénes's code.
> Benchmarking with R package rbenchmark and testing result like this
>
> library(rbenchmark)
> benchmark(fnSplit(DF), fnDataTable(DT),fnDavid(DF))
> identical(fnSplit(DF), fnDataTable(DT)[, result])
> identical(fnSplit(DF), fnDavid(DF))
>
> gave this:
>
> test replications elapsed relative user.self sys.self user.child
> 2 fnDataTable(DT) 100 0.829 1.000 0.762 0.066 0
> 3 fnDavid(DF) 100 1.615 1.948 1.515 0.098 0
> 1 fnSplit(DF) 100 2.878 3.472 2.685 0.190 0
> sys.child
> 2 0
> 3 0
> 1 0
>
>> identical(fnSplit(DF), fnDataTable(DT)[, result])
> [1] TRUE
>> identical(fnSplit(DF), fnDavid(DF))
> [1] TRUE
The above `TRUE' depends on the structure of ASB here. identical(...) is
often FALSE in the general case. A permutation of ASB is enough to show
this:
> DF$ASB <- sample(DF$ASB)
> identical(fnSplit(DF), fnDavid(DF))
[1] FALSE
>
unsplit() is the price you pay to cope with general orderings.
Chuck
More information about the R-help
mailing list