[R-sig-finance] How can I do this better? (Filling in last traded
price for NA)
john.gavin at ubs.com
john.gavin at ubs.com
Mon Sep 13 17:57:41 CEST 2004
Hi Ajay,
You will probably get other suggestions
along the following lines,
which use 'rle' and 'rep' to speed things up.
fillIn2 <- function(x)
{ bef <- x # keep a copy for display purposes only.
xRle <- rle(is.na(x))
# get indices where each NA seq starts (low) and stops (upp)
upp <- (sumX <- cumsum(xRle$lengths))[xRle$values]
low <- sumX[which(xRle$values)-1]+1
# special case: NA at start _only_ i.e. c(NA, ..., NA, notNa, ..., notNA)
if (length(low) == 0) return(cbind(before = x , after = x))
# special case: NA at start and else where
if (length(upp) == length(low)+1) upp <- upp[-1]
# Critical bit is 'rep' on RHS.
# On LHS, dont replace NAs at the start, if any.
ind <- low[1]-1
x[ind + which(is.na(x[-seq(ind)]))] <- x[rep(low-1, upp-low+1)]
cbind(before = bef , after = x) # show off before and after effect
}
set.seed(123)
x <- 1:10
x[sample(length(x), floor(length(x)/2))] <- NA
fillIn2(x)
should produce
> fillIn2(x)
before after
[1,] 1 1
[2,] 2 2
[3,] NA 2
[4,] NA 2
[5,] 5 5
[6,] NA 5
[7,] NA 5
[8,] NA 5
[9,] 9 9
[10,] 10 10
The code seems clunky and has special cases
so it is probably not optimal.
However, it is faster than, say, using 'mapply'
fillIn <- function(x)
{ bef <- x
xRle <- rle(is.na(x))
upp <- cumsum(xRle$lengths)[xRle$values]
low <- cumsum(xRle$lengths)[which(xRle$values)-1]+1
if (length(upp) == length(low)+1) upp <- upp[-1]
mapply(function(l, u) x[l:u] <<- x[l-1], low, upp)
cbind(before = bef , after = x) # show off before and after effect
}
fillIn(x)
Some simulations to compare times,
based on vectors of varying lengths with 50% of elements set to NA
simFillIn <- function(n, method = c("rep", "mapply"))
{ aa <- rpois(n, 5)
aa[sample(seq(n), floor(n * .5))] <- NA
method = match.arg(method)
ansTime <- system.time(ans <-
switch(method,
mapply = fillIn(aa),
rep = fillIn2(aa),
stop("wrong method")
)) # switch system.time
list(time = ansTime) # ans = ans,
}
ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1), simFillIn, method = "mapply")
lapply(ans, "[[", "time")
ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1), simFillIn, method = "rep")
lapply(ans, "[[", "time")
simFillIn (with 'mapply') seems at least 10 times slower
than simFillIn2 (with 'rep').
Regards,
John.
John Gavin <john.gavin at ubs.com>,
Quantitative Risk Models and Statistics,
UBS Investment Bank, 6th floor,
100 Liverpool St., London EC2M 2RH, UK.
Phone +44 (0) 207 567 4289
Fax +44 (0) 207 568 5352
Ajay Shah wrote:
>I have 3 different daily time-series. Using union() in the "its"
>package, I can make a long matrix, where rows are created when even
>one of the three time-series is observed:
>
>massive <- union(nifty.its, union(inrusd.its, infosys.its))
>
>Now in this, I want to replace NA values for prices by the
>most-recently observed price. I can do this painfully --
>
>for (i in 2:nrow(massive)) {
> for (j in 1:3) {
> if (is.na(massive[i,j])) {
> massive[i,j] = massive[i-1,j]
> }
> }
>}
>
>But this is horribly slow. Is there a more clever way?
Visit our website at http://www.ubs.com
This message contains confidential information and is intend...{{dropped}}
More information about the R-sig-finance
mailing list