[R] Automatically fix big jumps in one variable due to anomalies

Duncan Mackay mackay at northnet.com.au
Tue Mar 5 04:18:31 CET 2013


Hi Cesar

Not sure what you actually want to accomplish

?rle  may give you some ideas eg (I have added some to return to the 
good section)

x = c(246,251,250,255,5987,5991,5994,599,255,259,262,267)

xdiff = diff(x)
xdiff
  [1]     5    -1     5  5732     4     3 -5395  -344     4     3     5
rle(xdiff)
Run Length Encoding
   lengths: int [1:11] 1 1 1 1 1 1 1 1 1 1 ...
   values : num [1:11] 5 -1 5 5732 4 3 -5395 -344 4 3 ...
which(abs(rle(xdiff)[[2]] ) > 50)
[1] 4 7 8
rle(xdiff)[[2]][abs(rle(xdiff)[[2]] ) > 50]

It is then a matter of removing the required sequences or applying a 
function to them or substituting values ?zoo::na.approx from memory

HTH

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mackay at northnet.com.au



At 09:13 5/03/2013, you wrote:
>Hi,
>I am attaching a plot where you can see there are a few "jumps" (plots 1, 4,
>5 and 6), due to incidents with the measuring sensors (basically someone
>touching the sensor). I need to revert those changes to have a plot without
>unreal measurements, so make those fragments go back to its original pattern
>before the jump.
>
>I have used the function cpt.mean {changepoints} so I can identify the jumps
>and the mean of each segment. Now I don't know how to automatically revert
>the jumps, probably subtracting one higher fragment mean by the mean of the
>previous one. Does it make sense?
>
>Example of data set
>
>                 TIMESTAMP          variable   diameter
>38  2012-06-21 13:45:00     r4_3       NA
>86  2012-06-21 14:00:00     r4_3       NA
>134 2012-06-21 14:15:00     r4_3       246
>182 2012-06-21 14:30:00     r4_3       251
>230 2012-06-21 14:45:00     r4_3       250
>278 2012-06-21 15:00:00     r4_3       255
>326 2012-06-21 15:15:00     r4_3       5987
>374 2012-06-21 15:30:00     r4_3       5991
>422 2012-06-21 15:45:00     r4_3       5994
>470 2012-06-21 16:00:00     r4_3       5999
>
>As an example, this is the current diameter data:
>NA-NA-246-251-250-255-5987-5991-5994-599
>
>I would need this series without the big jump, avoiding the jump and
>following the increase/decrease pattern, for example:
>NA-NA-246-251-250-255-255-259-262-267
>
>Any other idea is welcome.
>
>
>
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list