[R] Vectorised operations

John Logsdon j.logsdon at quantex-research.com
Wed May 18 15:32:49 CEST 2016


Folks

I have some very long vectors - typically 1 million long - which are
indexed by another vector, same length, with values from 1 to a few
thousand, sp each sub part of the vector may be a few hundred values long.

I want to calculate the cumulative maximum of each sub part the main
vector by the index in an efficient manner.  This can obviously be done in
a loop but the whole calculation is embedded within many other
calculations which would make everything very slow indeed.  All the other
sums are vectorised already.

For example,

A=c(1,2,1,  -3,5,6,7,4,  6,3,7,6,9, ...)
i=c(1,1,1,   2,2,2,2,2,  3,3,3,3,3, ...)

where A has three levels that are not the same but the levels themselves
are all monotonic non-decreasing.

the answer to be a vector of the same length:

R=c(1,2,2,  -3,5,6,7,7,  6,6,7,7,9, ...)

If I could reset the cumulative maximum to -1e6 (eg) at each change of
index, a simple cummax would do but I can't see how to do this.

The best way I have found so far is to use the aggregate command:

as.vector(unlist(aggregate(a,list(i),cummax)[[2]]))

but rarely this fails, returning a shorter vector than expected and seems
rather ugly,  converting to and from lists which may well be an
unnecessary overhead.

I have been trying other approaches using apply() methods but either it
can't be done using them or I can't get my head round them!

Any ideas?

Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675



More information about the R-help mailing list