# [R] Smoothing Techniques - short stepwise functions with spikes

Gabor Grothendieck ggrothendieck at gmail.com
Tue May 11 12:46:45 CEST 2010

```This removes runs of length 1 and 2.   It replaces the values in any
such run with NA and then uses na.locf from the zoo package to fill
those NA's by carrying forward the last occurrence of a non-NA.  In
this example the run consisting of a single 2, the run consisting of
two 3's and the run consisting of a single 4 are removed:

> library(zoo) # na.locf
> x <- rep(c(1,2,1,3,1,4,3), c(4,1,5,2,6,1,5)); x
[1] 1 1 1 1 2 1 1 1 1 1 3 3 1 1 1 1 1 1 4 3 3 3 3 3
> r <- rle(x)
> r\$values<- na.locf(ifelse(r\$lengths <= 2, NA, r\$values))
> inverse.rle(r)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3

On Tue, May 11, 2010 at 3:17 AM, Ralf B <ralf.bierig at gmail.com> wrote:
> R Friends,
>
> I have data from which I would like to learn a more general
> (smoothened) trend by applying data smoothing methods. Data points
> follow a positive stepwise function.
>
>
> |                                    x
>                     x
> |                      xxxxxxxx xxxxxxxx
> |       x    x
> |xxxx xxx xxxx
> |                                                   xxxxxxxxxxxxxxxxx
> |
> |
>          xxxxxxx xxxx
> |__________________________________________________________
>
>
> Data points from each step should not be interacting with any other
> step. The outliers I want to to remove are spikes as shown in the
> diagram. These spikes do not have more than one or two points. I
> consider larger groups as relevant and want to keep them in. I
> sometimes have less than 5 points for each step, and up to 50 at max.
> Given these conditions would you suggest using one of the moving
> averages (e.g. SMA, EMA, DEMA, ...) or the locally linear regression
> (lowress) method. Are there any other options? Does anybody know a
> good site that overviews all methods without going to much into
> mathematical details but rather focusing on the requirements and
> underlying assumptions of each method? Is there perhaps even a package
> that runs and visualizes a comparison on the data similar to packages
> like 'party' ? (with 1000s of active packages, one can always hope for
> that)
>