[R] Conditionally adding a constant

Joshua Wiley jwiley.psych at gmail.com
Mon Jan 2 23:07:14 CET 2012


Good points, Rui.

On Mon, Jan 2, 2012 at 12:48 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello again,
>
> I believe we are all missing something. Isn't it possible to have NAs as the
> first values of 'y'?
> And isn't it also possible to have x[1] > 3?

Theoretically, yes, in the OPs data, maybe?  If the data is a time
series (or time series like), the zoo package is not a bad environment
to be working in anyways.  There are all sorts of handy functions (I
had almost recommended na.approx() which replaces NAs with a linear
interpolation) based on the OPs little example dataset.  Not sure if
the +2 thing is just an attempt at interpolation though or something
more general.

>
> Here is my point (I have changed function 'f2' to predict for such cases,
> 'f1' is rubbish)
>
> # Rui
> f3 <- function(x, y){
>        inx <- which(x > 3)
>        ynx <- which(is.na(y))
>        for(i in which(inx %in% ynx)) y[ynx[i]] <- y[ynx[i]-1] + 2L
>        y
> }
>
> # Jim's, as a function, 'na.rm' option added or else 'df3' would produce an
> error
> require(zoo)
> f4 <- function(x, y){
>        y <- na.locf(y, na.rm=FALSE)
>        inc <- cumsum(x > 3) * 2
>        y + inc
> }
>
> df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA))
> df
> df2 <- data.frame(x = c(1,2,3,4,5), y = c(10,20,NA,40,NA))
> df2
> df3 <- data.frame(x = c(1,2,3,4,5), y = rev(c(10,20,30,NA,NA)))
> df3
>
> # Joshua
> f(df$x, df$y)      # works
> f(df2$x, df2$y)    # infinite loop
> f(df3$x, df3$y)    # infinite loop
>
> # Rui
> f3(df$x, df$y)     # works
> f3(df2$x, df2$y)   # works as expected?
> f3(df3$x, df3$y)   # works as expected?
>
> # Jim
> f4(df$x, df$y)     # works
> f4(df2$x, df2$y)   # works as expected?
> f4(df3$x, df3$y)   # works as expected?
>
> If this makes sense, the performance tests are very much in favour of Jim's
> solution.
>
>
> # If this is what is asked for, test the performance
> # with large enough N
> N <- 1.e5
> dftest <- data.frame(x=1:N, y=c(sample(c(rep(NA, 5), 10*1:5), N,
> replace=TRUE)))
>
> sum(is.na(dftest))/N    # proportion of NAs in 'dftest'
>
> t2 <- system.time(invisible(apply(dftest, 2, f2)))[c(1, 3)]
> t3 <- system.time(invisible(f3(dftest$x, dftest$y)))[c(1, 3)]
> t4 <- system.time(invisible(f4(dftest$x, dftest$y)))[c(1, 3)]
> rbind(t2=t2, t3=t3, t4=t4, t2.t3=t2/t3, t2.t4=t2/t4, t3.t4=t3/t4)
>
> Sample output
>
>      user.self   elapsed
> t2      2.93000   2.95000
> t3      0.22000   0.22000
> t4      0.01000   0.01000
> t2.t3  13.31818  13.40909
> t2.t4 293.00000 295.00000
> t3.t4  22.00000  22.00000
>
> A factor of 300 over the initial solution or 20+ over the other loop based
> one.
>
> Downside, it needs an extra package loaded, but 'zoo' is rather common
> place.
>
> Rui Barradas
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Conditionally-adding-a-constant-tp4253049p4254470.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list