[R-sig-finance] How can I do this better? (Filling in last traded price for NA)

Mon Sep 13 11:35:57 CEST 2004

This seems like a perfect example of how to move from C
style code towards the vectorization that R likes.  All else
being equal, R will be faster the fewer function calls there are.
So even if you don't see a way out of all of the loops that you
have, the less that is in the loops, the better.

The first thing to notice here is that the missing values can be
done all in one go.  So we can have:

mass.na <- is.na(massive)
nc.mass <- ncol(massive)
for(i in 2:nrow(massive)) {
    for(j in 1:nc.mass) {
       if(mass.na[i,j]) {
            massive[i,j] <- massive[i-1, j]
       }
    }
}

But at this point we can notice that we can do more than one
substitution at a time within each column.  Also if we have all
missing values before some spot (very common for prices),
then we are never going to have a value for that spot. So let's
write a useful subfunction and rearrange the computation:

subfun.miss.use <- function(x) {
  mis <- which(is.na(x))
  mis[mis != seq(along=mis)]
}

for(j in 1:ncol(massive)) {
   while(length(this.mis <- subfun.miss.use(massive[, j]))) {
        massive[this.mis, j] <- massive[this.mis-1, j]
    }
}

Caution: this code hasn't been tested so there may be bugs in it.

You may want to limit how far back to look for a value.  One way
of doing this is to put a count on the "while" loop.  A more rigorous
way of doing it so that you don't partially fill long gaps is to use "rle".

Another feature that is common is to remove rows that have all missing
values, which apparently will not happen in the current setting.

Patrick Burns

Burns Statistics
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Ajay Shah wrote:

>I have 3 different daily time-series. Using union() in the "its"
>package, I can make a long matrix, where rows are created when even
>one of the three time-series is observed:
>
>massive <- union(nifty.its, union(inrusd.its, infosys.its))
>
>Now in this, I want to replace NA values for prices by the
>most-recently observed price. I can do this painfully --
>
>for (i in 2:nrow(massive)) {
>  for (j in 1:3) {
>    if (is.na(massive[i,j])) {
>      massive[i,j] = massive[i-1,j]
>    }
>  }
>}
>
>But this is horribly slow. Is there a more clever way?
>
>  
>