[R] find & remove sequences of at least N values for a specific value

jeff6868 geoffrey_klein at etu.u-bourgogne.fr
Thu Jul 10 14:34:22 CEST 2014


Hi everybody,

I have a small problem in a function, about removing short sequences of
identical numeric values.

For the example, we can consider this data, containing only some "0" and
"1":

test <- data.frame(x=c(0,0,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1))

The aim of my purpose here is simply to remove each sequence of "1" with a
length shorter than 5, and to keep sequences of "1" which are bigger than 5.
So my final data should look like this:

final <- data.frame(x=c(0,0,NA,NA,NA,0,0,0,0,1,1,1,1,1,1,1,1))

For the moment, I have this function:

    foo <- function(X,N){
      tab <- table(X[X==1])
      under.n <- as.numeric(names(tab)[tab<N]) 
      ind <- X %in% under.n
      Ind.sup <- which(ind)
      X <- ifelse(ind,NA,X)
    }

test$x <- apply(as.data.frame(test$x),2,function(x) foo(x,5))

The problem is that the function doesn't consider each sequence separately,
but only one sequence. I think that adding rle() instead of table() in my
function should to the trick, but it doesn't work yet. 
Does someone have an idea about fixing this problem?





--
View this message in context: http://r.789695.n4.nabble.com/find-remove-sequences-of-at-least-N-values-for-a-specific-value-tp4693810.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list