[R] problem (and solution) to rle on vector with NA values

Peter Ehlers ehlers at ucalgary.ca
Thu Jun 23 16:47:49 CEST 2011


On 2011-06-23 06:44, Cormac Long wrote:
> Hello there R-help,
>
> I'm not sure if this should be posted here - so apologies if this is the case.
> I've found a problem while using rle and am proposing a solution to the issue.
>
> Description:
> I ran into a niggle with rle today when working with vectors with NA values
> (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA values
> is not encoded in the same way as a run of other values. See the following
> example as an illustration:
>
> Example:
> The example
>          rv<-c(1,1,NA,NA,3,3,3);rle(rv)
> Returns
>          Run Length Encoding
>            lengths: int [1:4] 2 1 1 3
>            values : num [1:4] 1 NA NA 3
> not
>          Run Length Encoding
>            lengths: int [1:3] 2 2 3
>            values : num [1:3] 1 NA 3
> as I expected. This caused my code to fail later (unsurprising).
>
> Analysis:
> The problem stems from the test
>           y<- x[-1L] != x[-n]
> in line 7 of the rle function body. In this test, NA values return logical NA
> values, not TRUE/FALSE (again, unsurprising).
>
> Resolution:
> I modified the rle function code as included below. As far as I tested, this
> modification appears safe. The convoluted construction of naMaskVal
> should guarantee that the NA masking value is always different from
> any value in the vector and should be safe regardless of the input vector
> form (a raw vector is not handled since the NA values do not apply here).
>
> rle<-function (x)
> {
>      if (!is.vector(x)&&  !is.list(x))
>          stop("'x' must be an atomic vector")
>      n<- length(x)
>      if (n == 0L)
>          return(structure(list(lengths = integer(), values = x),
>              class = "rle"))
>
>      #### BEGIN NEW SECTION PART 1 ####
>      naRepFlag<-F
>      if(any(is.na(x))){
>          naRepFlag<-T
>          IS_LOGIC<-ifelse(typeof(x)=="logical",T,F)
>
>          if(typeof(x)=="logical"){
>              x<-as.integer(x)
>              naMaskVal<-2
>          }else if(typeof(x)=="character"){
>              naMaskVal<-paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse="")
>          }else{
>              naMaskVal<-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1
>          }
>
>          x[which(is.na(x))]<-naMaskVal
>      }
>      #### END NEW SECTION PART 1 ####
>
>      y<- x[-1L] != x[-n]
>      i<- c(which(y), n)
>
>      #### BEGIN NEW SECTION PART 2 ####
>      if(naRepFlag)
>          x[which(x==naMaskVal)]<-NA
>
>      if(IS_LOGIC)
>          x<-as.logical(x)
>      #### END NEW SECTION PART 2 ####
>
>      structure(list(lengths = diff(c(0L, i)), values = x[i]),
>          class = "rle")
> }
>
> Conclusion:
> I think that the proposed code modification is an improvement on the existing
> implementation of rle. Is it impertinent to suggest this R-modification to the
> gurus at R?
>
> Best wishes (in flame-war trepidation),

Well, it's not worth a flame, but ...
from the help page (see 'Details'):

  "Missing values are regarded as unequal to the previous value,
   even if that is also missing."

Peter Ehlers


> Dr. Cormac Long.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list