[R] Conditionally adding a constant

jim holtman jholtman at gmail.com
Tue Jan 3 02:21:26 CET 2012


If you are worried about an NA in the first, then use the following:

> y <- c(NA, 1, 2, NA, 4, NA)
> y <- na.locf(y, na.rm = FALSE)
> y
[1] NA  1  2  2  4  4
> y <- na.locf(y, fromLast = TRUE)
> y
[1] 1 1 2 2 4 4
>


On Mon, Jan 2, 2012 at 5:07 PM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
> Good points, Rui.
>
> On Mon, Jan 2, 2012 at 12:48 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>> Hello again,
>>
>> I believe we are all missing something. Isn't it possible to have NAs as the
>> first values of 'y'?
>> And isn't it also possible to have x[1] > 3?
>
> Theoretically, yes, in the OPs data, maybe?  If the data is a time
> series (or time series like), the zoo package is not a bad environment
> to be working in anyways.  There are all sorts of handy functions (I
> had almost recommended na.approx() which replaces NAs with a linear
> interpolation) based on the OPs little example dataset.  Not sure if
> the +2 thing is just an attempt at interpolation though or something
> more general.
>
>>
>> Here is my point (I have changed function 'f2' to predict for such cases,
>> 'f1' is rubbish)
>>
>> # Rui
>> f3 <- function(x, y){
>>        inx <- which(x > 3)
>>        ynx <- which(is.na(y))
>>        for(i in which(inx %in% ynx)) y[ynx[i]] <- y[ynx[i]-1] + 2L
>>        y
>> }
>>
>> # Jim's, as a function, 'na.rm' option added or else 'df3' would produce an
>> error
>> require(zoo)
>> f4 <- function(x, y){
>>        y <- na.locf(y, na.rm=FALSE)
>>        inc <- cumsum(x > 3) * 2
>>        y + inc
>> }
>>
>> df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA))
>> df
>> df2 <- data.frame(x = c(1,2,3,4,5), y = c(10,20,NA,40,NA))
>> df2
>> df3 <- data.frame(x = c(1,2,3,4,5), y = rev(c(10,20,30,NA,NA)))
>> df3
>>
>> # Joshua
>> f(df$x, df$y)      # works
>> f(df2$x, df2$y)    # infinite loop
>> f(df3$x, df3$y)    # infinite loop
>>
>> # Rui
>> f3(df$x, df$y)     # works
>> f3(df2$x, df2$y)   # works as expected?
>> f3(df3$x, df3$y)   # works as expected?
>>
>> # Jim
>> f4(df$x, df$y)     # works
>> f4(df2$x, df2$y)   # works as expected?
>> f4(df3$x, df3$y)   # works as expected?
>>
>> If this makes sense, the performance tests are very much in favour of Jim's
>> solution.
>>
>>
>> # If this is what is asked for, test the performance
>> # with large enough N
>> N <- 1.e5
>> dftest <- data.frame(x=1:N, y=c(sample(c(rep(NA, 5), 10*1:5), N,
>> replace=TRUE)))
>>
>> sum(is.na(dftest))/N    # proportion of NAs in 'dftest'
>>
>> t2 <- system.time(invisible(apply(dftest, 2, f2)))[c(1, 3)]
>> t3 <- system.time(invisible(f3(dftest$x, dftest$y)))[c(1, 3)]
>> t4 <- system.time(invisible(f4(dftest$x, dftest$y)))[c(1, 3)]
>> rbind(t2=t2, t3=t3, t4=t4, t2.t3=t2/t3, t2.t4=t2/t4, t3.t4=t3/t4)
>>
>> Sample output
>>
>>      user.self   elapsed
>> t2      2.93000   2.95000
>> t3      0.22000   0.22000
>> t4      0.01000   0.01000
>> t2.t3  13.31818  13.40909
>> t2.t4 293.00000 295.00000
>> t3.t4  22.00000  22.00000
>>
>> A factor of 300 over the initial solution or 20+ over the other loop based
>> one.
>>
>> Downside, it needs an extra package loaded, but 'zoo' is rather common
>> place.
>>
>> Rui Barradas
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Conditionally-adding-a-constant-tp4253049p4254470.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list