[R] Vectorised operations

Thu May 19 21:55:13 CEST 2016

In keeping with the theme of reducing unnecessary overhead
(and using William's example data)

> system.time( vAve <- ave(a, i, FUN=cummax) )
   user  system elapsed
  0.125   0.003   0.127
> system.time( b <- unlist( lapply( split(a,i) , cummax) ) )
   user  system elapsed
  0.320   0.007   0.327
> system.time( b <- unlist( lapply( split(a,i) , cummax) ,
>use.names=FALSE) )
   user  system elapsed
  0.067   0.001   0.068

> all.equal(vAve, b)
[1] TRUE

Apparently, quite a bit of overhead associated with keeping the names when
unlisting.

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

On 5/18/16, 7:26 AM, "R-help on behalf of William Dunlap via R-help"
<r-help-bounces at r-project.org on behalf of r-help at r-project.org> wrote:

>ave(A, i, FUN=cummax) loops but is faster than your aggregate-based
>solution.  E.g.,
>
>> i <- rep(1:10000, sample(0:210, replace=TRUE, size=10000))
>> length(i)
>[1] 1056119
>> a <- sample(-50:50, replace=TRUE, size=length(i))
>> system.time( vAve <- ave(a, i, FUN=cummax) )
>   user  system elapsed
>   0.13    0.03    0.16
>> system.time( vAggregate <-
>as.vector(unlist(aggregate(a,list(i),cummax)[[2]])) )
>   user  system elapsed
>   1.81    0.13    1.98
>> all.equal(vAve, vAggregate)
>[1] TRUE
>
>
>
>Bill Dunlap
>TIBCO Software
>wdunlap tibco.com
>
>On Wed, May 18, 2016 at 6:32 AM, John Logsdon <
>j.logsdon at quantex-research.com> wrote:
>
>> Folks
>>
>> I have some very long vectors - typically 1 million long - which are
>> indexed by another vector, same length, with values from 1 to a few
>> thousand, sp each sub part of the vector may be a few hundred values
>>long.
>>
>> I want to calculate the cumulative maximum of each sub part the main
>> vector by the index in an efficient manner.  This can obviously be done
>>in
>> a loop but the whole calculation is embedded within many other
>> calculations which would make everything very slow indeed.  All the
>>other
>> sums are vectorised already.
>>
>> For example,
>>
>> A=c(1,2,1,  -3,5,6,7,4,  6,3,7,6,9, ...)
>> i=c(1,1,1,   2,2,2,2,2,  3,3,3,3,3, ...)
>>
>> where A has three levels that are not the same but the levels themselves
>> are all monotonic non-decreasing.
>>
>> the answer to be a vector of the same length:
>>
>> R=c(1,2,2,  -3,5,6,7,7,  6,6,7,7,9, ...)
>>
>> If I could reset the cumulative maximum to -1e6 (eg) at each change of
>> index, a simple cummax would do but I can't see how to do this.
>>
>> The best way I have found so far is to use the aggregate command:
>>
>> as.vector(unlist(aggregate(a,list(i),cummax)[[2]]))
>>
>> but rarely this fails, returning a shorter vector than expected and
>>seems
>> rather ugly,  converting to and from lists which may well be an
>> unnecessary overhead.
>>
>> I have been trying other approaches using apply() methods but either it
>> can't be done using them or I can't get my head round them!
>>
>> Any ideas?
>>
>> Best wishes
>>
>> John
>>
>> John Logsdon
>> Quantex Research Ltd
>> +44 161 445 4951/+44 7717758675
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> 
>>https://secure-web.cisco.com/1db6hsP9YKn27F8A9c3lLtE4FDoYVpnnKmVgP0ZTGuPp
>>rrXWaCCwKPZt-pMgmapmF56MrgngzykSrZV_gXR2fFi1PX6vWBRDFYUhqF2AyuCUF2v4-ZN-8
>>q7fO3mBBnj_2k4lYyx46FqHtq2YNFkc-Hsh3zRxdA0WP8-5LlqRS76CzguBuwflIHhF6RC9n8
>>bi4GGTgNwUAZkfBIBU1Sq2Um1UovWcAe6Su1C7PC6N8LMqOBxCzdIjLT5P_esNZi3t5WiA7U9
>>DdEXxH-RdLJVyrMLmjvyuoCBYponGY4gRxSKSAIB-PuWULy7N1CGCGfMbmeN5tF1NsCnENwLS
>>NH29UinTSrcPwdtvMMh_2PKZ0CjY/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flisti
>>nfo%2Fr-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://secure-web.cisco.com/1db6hsP9YKn27F8A9c3lLtE4FDoYVpnnKmVgP0ZTGuPpr
>rXWaCCwKPZt-pMgmapmF56MrgngzykSrZV_gXR2fFi1PX6vWBRDFYUhqF2AyuCUF2v4-ZN-8q7
>fO3mBBnj_2k4lYyx46FqHtq2YNFkc-Hsh3zRxdA0WP8-5LlqRS76CzguBuwflIHhF6RC9n8bi4
>GGTgNwUAZkfBIBU1Sq2Um1UovWcAe6Su1C7PC6N8LMqOBxCzdIjLT5P_esNZi3t5WiA7U9DdEX
>xH-RdLJVyrMLmjvyuoCBYponGY4gRxSKSAIB-PuWULy7N1CGCGfMbmeN5tF1NsCnENwLSNH29U
>inTSrcPwdtvMMh_2PKZ0CjY/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2F
>r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>