[R] downsampling

Mon Jul 27 16:22:44 CEST 2009

On Mon, Jul 27, 2009 at 02:42:33PM +0200, Jan M. Wiener wrote:
> However, both approx() and spline() seem to select the number of
> required data points from the original data (at the correct positions,
> of course) and ignore the remaining data points, as the following
> example demonstrates:
> 
> > a= c(1,0,2,1,0)
> 
> > approx(a,n=3)
> $x
> [1] 1 3 5
> 
> $y
> [1] 1 2 0
> 
> Essentially, what approx has done (spline does the same) is to simply
> select the first, third, and fifth entry (as we want to downsample a 5
> point vector into a three point vector). The second and fourth data
> point are completely ignored.

That seems to be what Warren described as the 'degenerate case'
where approx will 'just throw away every other sample'. If you choose
a differetn n (e.g. n=4) interpolation does happen.

> This can result in quite dramatic changes
> of your data, if the data points selected by approx() or spline() happen
> to be outliers and if you downsample data by a rather strong factor.

Yes, that could affect your downsampled data. For more
robustness it would probably be better to fit a proper model (if you
have one) or a lowess curve (or smooth.spline) and go from there.

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/