[R] how to use function of rle approx ifelse etc. in data frame

Rosa rosatan6 at gmail.com
Thu Aug 2 00:52:03 CEST 2012


Hello R help,

I have this data frame M2[160000,5] with NAs, a simple example would be:
set.seed(1234)
M2<-expand.grid(ID=182:183, year=2012, month=1:3, day=1:3,
KEEP.OUT.ATTRS=FALSE)
M2 <- M2[with(M2, order(ID, year, month, day)),] #sort the data
M2$value <- sample(c(NA, rnorm(100)), nrow(M2), 
                   prob=c(0.5, rep(0.5/100, 100)), replace=TRUE)
M2:
    ID year month day      value
1  182 2012     1   1 -0.5012581
7  182 2012     1   2  1.1022975
13 182 2012     1   3         NA
3  182 2012     2   1 -0.1623095
9  182 2012     2   2  1.1022975
15 182 2012     2   3 -1.2519859
5  182 2012     3   1         NA
11 182 2012     3   2         NA
17 182 2012     3   3         NA
2  183 2012     1   1  0.9729168
8  183 2012     1   2  0.9594941
14 183 2012     1   3         NA
4  183 2012     2   1         NA
10 183 2012     2   2 -1.1088896
16 183 2012     2   3  0.9594941
6  183 2012     3   1 -0.4027320
12 183 2012     3   2 -0.0151383
18 183 2012     3   3 -1.0686427

In this example the max continuous NA is 3, while the data I have could have
more than 10 NAs, what I need to do is:
1, split the data according to ID, year and month;
2, in each subset, if there are less than 5 continuous NA, repeat the prior
data; if there are 5-10 NA, do a linear interpolation; and if there are more
than 10 NA, delete the whole month;
3, if the first day of the month is NA, use the function backward.

 So far thanks to sebastian-c, the part of more than 10 NA is done:
library(zoo)
NA_run <- function(x, maxlen){
  runs <- rle(is.na(x$value))
  if(any(runs$lengths[runs$values] >= maxlen)) NULL else x
  }    
library(plyr)        
rem <- ddply(M2, .(ID, year, month), NA_run, 10) 

As to the other two parts, I figured out if less than 5 NA, use:
na.locf(rem$value, na.rm=FALSE, maxgap=5); and if 5<NA<10,
use:approx(rem$value, n=length(rem$value))$y; however when I put them into
if else, it keeps failing me, is it because  it is in data frame? I checked
many posts on this issue, but doesn't work on mine, any help would be
appreciated, thanks.



--
View this message in context: http://r.789695.n4.nabble.com/how-to-use-function-of-rle-approx-ifelse-etc-in-data-frame-tp4638778.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list