[R-sig-finance] How can I do this better? (Filling in last tr aded price for NA)

Mon Sep 13 22:20:06 CEST 2004

I'm not convinced.  If you should be lucky enough to have enough
data that Gabor's solution will take more than a few milliseconds,
then I would think you might want to limit the length of filling.

Patrick Burns

Burns Statistics
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Matthew Dowle wrote:

>Isn't C the right tool for this job? Something like this (cobbling cumsum
>itself)? This is untested and very unlikely to be exactly correct.
>
>static SEXP fillna(SEXP x, SEXP s)
>{
>    int i=0;
>    double last=R_NA;
>    while (i<length(x)) {
>	if (!(ISNAN(REAL(x)[i])))
>	    last = REAL(x)[i];
>	REAL(s)[i++] = last;
>    }
>    return s;
>}
>
>Even Gabor's LOCF involved just 3 calls: is.na(), which() and cumsum(), but
>if I understand correctly it involves 3 loops (internally) over the entire
>vector plus the associated memory copies of each call. fillna as above
>should be as fast as one call to cumsum(), requiring much less working
>memory than any R solution. If this is the case, perhaps something like it
>could be added to R?
>
>Regards,
>Matthew
>
>-----Original Message-----
>From: john.gavin at ubs.com [mailto:john.gavin at ubs.com] 
>Sent: 13 September 2004 16:58
>To: r-sig-finance at stat.math.ethz.ch
>Subject: Re: [R-sig-finance] How can I do this better? (Filling in last
>traded price for NA)
>
>
>Hi Ajay,
>
>You will probably get other suggestions
>along the following lines,
>which use 'rle' and 'rep' to speed things up.
>
>fillIn2 <- function(x)
>{ bef <- x # keep a copy for display purposes only.
>  xRle <- rle(is.na(x))
>  # get indices where each NA seq starts (low) and stops (upp)
>  upp <- (sumX <- cumsum(xRle$lengths))[xRle$values]
>  low <- sumX[which(xRle$values)-1]+1
>  # special case: NA at start _only_ i.e. c(NA, ..., NA, notNa, ..., notNA)
>  if (length(low) == 0) return(cbind(before = x , after = x))
>  # special case: NA at start and else where 
>  if (length(upp) == length(low)+1) upp <- upp[-1]
>  # Critical bit is 'rep' on RHS. 
>  # On LHS, dont replace NAs at the start, if any.
>  ind <- low[1]-1
>  x[ind + which(is.na(x[-seq(ind)]))] <- x[rep(low-1, upp-low+1)]
>  cbind(before = bef , after = x) # show off before and after effect }
>set.seed(123)
>x <- 1:10
>x[sample(length(x), floor(length(x)/2))] <- NA
>fillIn2(x)
>
>should produce
>
>  
>
>>fillIn2(x)
>>    
>>
>      before after
> [1,]      1     1
> [2,]      2     2
> [3,]     NA     2
> [4,]     NA     2
> [5,]      5     5
> [6,]     NA     5
> [7,]     NA     5
> [8,]     NA     5
> [9,]      9     9
>[10,]     10    10
>
>The code seems clunky and has special cases
>so it is probably not optimal.
>
>However, it is faster than, say, using 'mapply'
>
>fillIn <- function(x)
>{ bef <- x
>  xRle <- rle(is.na(x))
>  upp <- cumsum(xRle$lengths)[xRle$values]
>  low <- cumsum(xRle$lengths)[which(xRle$values)-1]+1
>  if (length(upp) == length(low)+1) upp <- upp[-1]
>  mapply(function(l, u) x[l:u] <<- x[l-1], low, upp)
>  cbind(before = bef , after = x) # show off before and after effect }
>fillIn(x)
>
>Some simulations to compare times,
>based on vectors of varying lengths with 50% of elements set to NA
>
>simFillIn <- function(n, method = c("rep", "mapply"))
>{ aa <- rpois(n, 5)
>  aa[sample(seq(n), floor(n * .5))] <- NA
>  method = match.arg(method)
>  ansTime <- system.time(ans <- 
>    switch(method,
>      mapply = fillIn(aa),
>      rep = fillIn2(aa), 
>      stop("wrong method")
>  )) # switch system.time
>  list(time = ansTime) # ans = ans, 
>}
>ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1), simFillIn, method = "mapply")
>lapply(ans, "[[", "time") ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1),
>simFillIn, method = "rep") lapply(ans, "[[", "time")
>
>simFillIn (with 'mapply') seems at least 10 times slower
>than simFillIn2 (with 'rep').
>
>Regards,
>
>John.
>
>John Gavin <john.gavin at ubs.com>,
>Quantitative Risk Models and Statistics,
>UBS Investment Bank, 6th floor, 
>100 Liverpool St., London EC2M 2RH, UK.
>Phone +44 (0) 207 567 4289
>Fax   +44 (0) 207 568 5352
>
>
>Ajay Shah wrote:
>
>  
>
>>I have 3 different daily time-series. Using union() in the "its" 
>>package, I can make a long matrix, where rows are created when even one 
>>of the three time-series is observed:
>>
>>massive <- union(nifty.its, union(inrusd.its, infosys.its))
>>
>>Now in this, I want to replace NA values for prices by the 
>>most-recently observed price. I can do this painfully --
>>
>>for (i in 2:nrow(massive)) {
>> for (j in 1:3) {
>>   if (is.na(massive[i,j])) {
>>     massive[i,j] = massive[i-1,j]
>>   }
>> }
>>}
>>
>>But this is horribly slow. Is there a more clever way?
>>    
>>
>
>Visit our website at http://www.ubs.com
>
>This message contains confidential information and is intend...{{dropped}}
>
>_______________________________________________
>R-sig-finance at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>
>_______________________________________________
>R-sig-finance at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>
>  
>