[R-sig-finance] How can I do this better? (Filling in last tr aded price for NA)

Matthew Dowle mdowle at concordiafunds.com
Mon Sep 13 22:06:23 CEST 2004


Isn't C the right tool for this job? Something like this (cobbling cumsum
itself)? This is untested and very unlikely to be exactly correct.

static SEXP fillna(SEXP x, SEXP s)
{
    int i=0;
    double last=R_NA;
    while (i<length(x)) {
	if (!(ISNAN(REAL(x)[i])))
	    last = REAL(x)[i];
	REAL(s)[i++] = last;
    }
    return s;
}

Even Gabor's LOCF involved just 3 calls: is.na(), which() and cumsum(), but
if I understand correctly it involves 3 loops (internally) over the entire
vector plus the associated memory copies of each call. fillna as above
should be as fast as one call to cumsum(), requiring much less working
memory than any R solution. If this is the case, perhaps something like it
could be added to R?

Regards,
Matthew

-----Original Message-----
From: john.gavin at ubs.com [mailto:john.gavin at ubs.com] 
Sent: 13 September 2004 16:58
To: r-sig-finance at stat.math.ethz.ch
Subject: Re: [R-sig-finance] How can I do this better? (Filling in last
traded price for NA)


Hi Ajay,

You will probably get other suggestions
along the following lines,
which use 'rle' and 'rep' to speed things up.

fillIn2 <- function(x)
{ bef <- x # keep a copy for display purposes only.
  xRle <- rle(is.na(x))
  # get indices where each NA seq starts (low) and stops (upp)
  upp <- (sumX <- cumsum(xRle$lengths))[xRle$values]
  low <- sumX[which(xRle$values)-1]+1
  # special case: NA at start _only_ i.e. c(NA, ..., NA, notNa, ..., notNA)
  if (length(low) == 0) return(cbind(before = x , after = x))
  # special case: NA at start and else where 
  if (length(upp) == length(low)+1) upp <- upp[-1]
  # Critical bit is 'rep' on RHS. 
  # On LHS, dont replace NAs at the start, if any.
  ind <- low[1]-1
  x[ind + which(is.na(x[-seq(ind)]))] <- x[rep(low-1, upp-low+1)]
  cbind(before = bef , after = x) # show off before and after effect }
set.seed(123)
x <- 1:10
x[sample(length(x), floor(length(x)/2))] <- NA
fillIn2(x)

should produce

> fillIn2(x)
      before after
 [1,]      1     1
 [2,]      2     2
 [3,]     NA     2
 [4,]     NA     2
 [5,]      5     5
 [6,]     NA     5
 [7,]     NA     5
 [8,]     NA     5
 [9,]      9     9
[10,]     10    10

The code seems clunky and has special cases
so it is probably not optimal.

However, it is faster than, say, using 'mapply'

fillIn <- function(x)
{ bef <- x
  xRle <- rle(is.na(x))
  upp <- cumsum(xRle$lengths)[xRle$values]
  low <- cumsum(xRle$lengths)[which(xRle$values)-1]+1
  if (length(upp) == length(low)+1) upp <- upp[-1]
  mapply(function(l, u) x[l:u] <<- x[l-1], low, upp)
  cbind(before = bef , after = x) # show off before and after effect }
fillIn(x)

Some simulations to compare times,
based on vectors of varying lengths with 50% of elements set to NA

simFillIn <- function(n, method = c("rep", "mapply"))
{ aa <- rpois(n, 5)
  aa[sample(seq(n), floor(n * .5))] <- NA
  method = match.arg(method)
  ansTime <- system.time(ans <- 
    switch(method,
      mapply = fillIn(aa),
      rep = fillIn2(aa), 
      stop("wrong method")
  )) # switch system.time
  list(time = ansTime) # ans = ans, 
}
ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1), simFillIn, method = "mapply")
lapply(ans, "[[", "time") ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1),
simFillIn, method = "rep") lapply(ans, "[[", "time")

simFillIn (with 'mapply') seems at least 10 times slower
than simFillIn2 (with 'rep').

Regards,

John.

John Gavin <john.gavin at ubs.com>,
Quantitative Risk Models and Statistics,
UBS Investment Bank, 6th floor, 
100 Liverpool St., London EC2M 2RH, UK.
Phone +44 (0) 207 567 4289
Fax   +44 (0) 207 568 5352


Ajay Shah wrote:

>I have 3 different daily time-series. Using union() in the "its" 
>package, I can make a long matrix, where rows are created when even one 
>of the three time-series is observed:
>
>massive <- union(nifty.its, union(inrusd.its, infosys.its))
>
>Now in this, I want to replace NA values for prices by the 
>most-recently observed price. I can do this painfully --
>
>for (i in 2:nrow(massive)) {
>  for (j in 1:3) {
>    if (is.na(massive[i,j])) {
>      massive[i,j] = massive[i-1,j]
>    }
>  }
>}
>
>But this is horribly slow. Is there a more clever way?

Visit our website at http://www.ubs.com

This message contains confidential information and is intend...{{dropped}}

_______________________________________________
R-sig-finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance



More information about the R-sig-finance mailing list