[R] Vectorised operations
MacQueen, Don
macqueen1 at llnl.gov
Thu May 19 21:55:13 CEST 2016
In keeping with the theme of reducing unnecessary overhead
(and using William's example data)
> system.time( vAve <- ave(a, i, FUN=cummax) )
user system elapsed
0.125 0.003 0.127
> system.time( b <- unlist( lapply( split(a,i) , cummax) ) )
user system elapsed
0.320 0.007 0.327
> system.time( b <- unlist( lapply( split(a,i) , cummax) ,
>use.names=FALSE) )
user system elapsed
0.067 0.001 0.068
> all.equal(vAve, b)
[1] TRUE
Apparently, quite a bit of overhead associated with keeping the names when
unlisting.
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
On 5/18/16, 7:26 AM, "R-help on behalf of William Dunlap via R-help"
<r-help-bounces at r-project.org on behalf of r-help at r-project.org> wrote:
>ave(A, i, FUN=cummax) loops but is faster than your aggregate-based
>solution. E.g.,
>
>> i <- rep(1:10000, sample(0:210, replace=TRUE, size=10000))
>> length(i)
>[1] 1056119
>> a <- sample(-50:50, replace=TRUE, size=length(i))
>> system.time( vAve <- ave(a, i, FUN=cummax) )
> user system elapsed
> 0.13 0.03 0.16
>> system.time( vAggregate <-
>as.vector(unlist(aggregate(a,list(i),cummax)[[2]])) )
> user system elapsed
> 1.81 0.13 1.98
>> all.equal(vAve, vAggregate)
>[1] TRUE
>
>
>
>Bill Dunlap
>TIBCO Software
>wdunlap tibco.com
>
>On Wed, May 18, 2016 at 6:32 AM, John Logsdon <
>j.logsdon at quantex-research.com> wrote:
>
>> Folks
>>
>> I have some very long vectors - typically 1 million long - which are
>> indexed by another vector, same length, with values from 1 to a few
>> thousand, sp each sub part of the vector may be a few hundred values
>>long.
>>
>> I want to calculate the cumulative maximum of each sub part the main
>> vector by the index in an efficient manner. This can obviously be done
>>in
>> a loop but the whole calculation is embedded within many other
>> calculations which would make everything very slow indeed. All the
>>other
>> sums are vectorised already.
>>
>> For example,
>>
>> A=c(1,2,1, -3,5,6,7,4, 6,3,7,6,9, ...)
>> i=c(1,1,1, 2,2,2,2,2, 3,3,3,3,3, ...)
>>
>> where A has three levels that are not the same but the levels themselves
>> are all monotonic non-decreasing.
>>
>> the answer to be a vector of the same length:
>>
>> R=c(1,2,2, -3,5,6,7,7, 6,6,7,7,9, ...)
>>
>> If I could reset the cumulative maximum to -1e6 (eg) at each change of
>> index, a simple cummax would do but I can't see how to do this.
>>
>> The best way I have found so far is to use the aggregate command:
>>
>> as.vector(unlist(aggregate(a,list(i),cummax)[[2]]))
>>
>> but rarely this fails, returning a shorter vector than expected and
>>seems
>> rather ugly, converting to and from lists which may well be an
>> unnecessary overhead.
>>
>> I have been trying other approaches using apply() methods but either it
>> can't be done using them or I can't get my head round them!
>>
>> Any ideas?
>>
>> Best wishes
>>
>> John
>>
>> John Logsdon
>> Quantex Research Ltd
>> +44 161 445 4951/+44 7717758675
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>
>>https://secure-web.cisco.com/1db6hsP9YKn27F8A9c3lLtE4FDoYVpnnKmVgP0ZTGuPp
>>rrXWaCCwKPZt-pMgmapmF56MrgngzykSrZV_gXR2fFi1PX6vWBRDFYUhqF2AyuCUF2v4-ZN-8
>>q7fO3mBBnj_2k4lYyx46FqHtq2YNFkc-Hsh3zRxdA0WP8-5LlqRS76CzguBuwflIHhF6RC9n8
>>bi4GGTgNwUAZkfBIBU1Sq2Um1UovWcAe6Su1C7PC6N8LMqOBxCzdIjLT5P_esNZi3t5WiA7U9
>>DdEXxH-RdLJVyrMLmjvyuoCBYponGY4gRxSKSAIB-PuWULy7N1CGCGfMbmeN5tF1NsCnENwLS
>>NH29UinTSrcPwdtvMMh_2PKZ0CjY/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flisti
>>nfo%2Fr-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://secure-web.cisco.com/1db6hsP9YKn27F8A9c3lLtE4FDoYVpnnKmVgP0ZTGuPpr
>rXWaCCwKPZt-pMgmapmF56MrgngzykSrZV_gXR2fFi1PX6vWBRDFYUhqF2AyuCUF2v4-ZN-8q7
>fO3mBBnj_2k4lYyx46FqHtq2YNFkc-Hsh3zRxdA0WP8-5LlqRS76CzguBuwflIHhF6RC9n8bi4
>GGTgNwUAZkfBIBU1Sq2Um1UovWcAe6Su1C7PC6N8LMqOBxCzdIjLT5P_esNZi3t5WiA7U9DdEX
>xH-RdLJVyrMLmjvyuoCBYponGY4gRxSKSAIB-PuWULy7N1CGCGfMbmeN5tF1NsCnENwLSNH29U
>inTSrcPwdtvMMh_2PKZ0CjY/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2F
>r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list