[R-sig-finance] How can I do this better? (Filling in last tr
aded price for NA)
Patrick Burns
patrick at burns-stat.com
Mon Sep 13 22:20:06 CEST 2004
I'm not convinced. If you should be lucky enough to have enough
data that Gabor's solution will take more than a few milliseconds,
then I would think you might want to limit the length of filling.
Patrick Burns
Burns Statistics
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
Matthew Dowle wrote:
>Isn't C the right tool for this job? Something like this (cobbling cumsum
>itself)? This is untested and very unlikely to be exactly correct.
>
>static SEXP fillna(SEXP x, SEXP s)
>{
> int i=0;
> double last=R_NA;
> while (i<length(x)) {
> if (!(ISNAN(REAL(x)[i])))
> last = REAL(x)[i];
> REAL(s)[i++] = last;
> }
> return s;
>}
>
>Even Gabor's LOCF involved just 3 calls: is.na(), which() and cumsum(), but
>if I understand correctly it involves 3 loops (internally) over the entire
>vector plus the associated memory copies of each call. fillna as above
>should be as fast as one call to cumsum(), requiring much less working
>memory than any R solution. If this is the case, perhaps something like it
>could be added to R?
>
>Regards,
>Matthew
>
>-----Original Message-----
>From: john.gavin at ubs.com [mailto:john.gavin at ubs.com]
>Sent: 13 September 2004 16:58
>To: r-sig-finance at stat.math.ethz.ch
>Subject: Re: [R-sig-finance] How can I do this better? (Filling in last
>traded price for NA)
>
>
>Hi Ajay,
>
>You will probably get other suggestions
>along the following lines,
>which use 'rle' and 'rep' to speed things up.
>
>fillIn2 <- function(x)
>{ bef <- x # keep a copy for display purposes only.
> xRle <- rle(is.na(x))
> # get indices where each NA seq starts (low) and stops (upp)
> upp <- (sumX <- cumsum(xRle$lengths))[xRle$values]
> low <- sumX[which(xRle$values)-1]+1
> # special case: NA at start _only_ i.e. c(NA, ..., NA, notNa, ..., notNA)
> if (length(low) == 0) return(cbind(before = x , after = x))
> # special case: NA at start and else where
> if (length(upp) == length(low)+1) upp <- upp[-1]
> # Critical bit is 'rep' on RHS.
> # On LHS, dont replace NAs at the start, if any.
> ind <- low[1]-1
> x[ind + which(is.na(x[-seq(ind)]))] <- x[rep(low-1, upp-low+1)]
> cbind(before = bef , after = x) # show off before and after effect }
>set.seed(123)
>x <- 1:10
>x[sample(length(x), floor(length(x)/2))] <- NA
>fillIn2(x)
>
>should produce
>
>
>
>>fillIn2(x)
>>
>>
> before after
> [1,] 1 1
> [2,] 2 2
> [3,] NA 2
> [4,] NA 2
> [5,] 5 5
> [6,] NA 5
> [7,] NA 5
> [8,] NA 5
> [9,] 9 9
>[10,] 10 10
>
>The code seems clunky and has special cases
>so it is probably not optimal.
>
>However, it is faster than, say, using 'mapply'
>
>fillIn <- function(x)
>{ bef <- x
> xRle <- rle(is.na(x))
> upp <- cumsum(xRle$lengths)[xRle$values]
> low <- cumsum(xRle$lengths)[which(xRle$values)-1]+1
> if (length(upp) == length(low)+1) upp <- upp[-1]
> mapply(function(l, u) x[l:u] <<- x[l-1], low, upp)
> cbind(before = bef , after = x) # show off before and after effect }
>fillIn(x)
>
>Some simulations to compare times,
>based on vectors of varying lengths with 50% of elements set to NA
>
>simFillIn <- function(n, method = c("rep", "mapply"))
>{ aa <- rpois(n, 5)
> aa[sample(seq(n), floor(n * .5))] <- NA
> method = match.arg(method)
> ansTime <- system.time(ans <-
> switch(method,
> mapply = fillIn(aa),
> rep = fillIn2(aa),
> stop("wrong method")
> )) # switch system.time
> list(time = ansTime) # ans = ans,
>}
>ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1), simFillIn, method = "mapply")
>lapply(ans, "[[", "time") ans <- lapply(c(2e4, 1e4, 1e3, 1e2, 1e1),
>simFillIn, method = "rep") lapply(ans, "[[", "time")
>
>simFillIn (with 'mapply') seems at least 10 times slower
>than simFillIn2 (with 'rep').
>
>Regards,
>
>John.
>
>John Gavin <john.gavin at ubs.com>,
>Quantitative Risk Models and Statistics,
>UBS Investment Bank, 6th floor,
>100 Liverpool St., London EC2M 2RH, UK.
>Phone +44 (0) 207 567 4289
>Fax +44 (0) 207 568 5352
>
>
>Ajay Shah wrote:
>
>
>
>>I have 3 different daily time-series. Using union() in the "its"
>>package, I can make a long matrix, where rows are created when even one
>>of the three time-series is observed:
>>
>>massive <- union(nifty.its, union(inrusd.its, infosys.its))
>>
>>Now in this, I want to replace NA values for prices by the
>>most-recently observed price. I can do this painfully --
>>
>>for (i in 2:nrow(massive)) {
>> for (j in 1:3) {
>> if (is.na(massive[i,j])) {
>> massive[i,j] = massive[i-1,j]
>> }
>> }
>>}
>>
>>But this is horribly slow. Is there a more clever way?
>>
>>
>
>Visit our website at http://www.ubs.com
>
>This message contains confidential information and is intend...{{dropped}}
>
>_______________________________________________
>R-sig-finance at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>
>_______________________________________________
>R-sig-finance at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>
>
>
More information about the R-sig-finance
mailing list