[R] How to improve the robustness of "loess"? - example included.
Emmanuel Levy
emmanuel.levy at gmail.com
Sun Mar 11 02:02:55 CET 2012
Ok so this seems to work :)
tmp=rnorm(2000)
X.background = 5+tmp
Y.background = 5+ (10*tmp+rnorm(2000))
X.specific = 3.5+3*runif(3000)
Y.specific = 5+120*runif(3000)
X = c(X.background, X.specific)
Y = c(Y.background, Y.specific)
MINx=range(X)[1]
MAXx=range(X)[2]
MINy=range(Y)[1]
MAXy=range(Y)[2]
## estimates the density for each datapoint
nBins=50
my.lims= c(range(X,na.rm=TRUE),range(Y,na.rm=TRUE))
z1 = kde2d(X,Y,n=nBins, lims=my.lims, h= c(
(my.lims[2]-my.lims[1])/(nBins/4) , (my.lims[4]-my.lims[3])/(nBins/4)
) )
X.cut = cut(X, seq(z1$x[1], z1$x[nBins],len=(nBins+1) ))
Y.cut = cut(Y, seq(z1$y[1], z1$y[nBins],len=(nBins+1) ))
xy.cuts = data.frame(X.cut,Y.cut, ord=1:(length(X.cut)) )
density = data.frame( X=rep(factor(levels(X.cut)),rep(nBins) ),
Y=rep(factor(levels(Y.cut)), rep(nBins,nBins) ) , Z= as.vector(z1$z))
xy.density = merge( xy.cuts, density, by=c(1,2), sort=FALSE, all.x=TRUE)
xy.density = xy.density[order(x=xy.density$ord),]
### Now uses the density as a weight
my.loess = loess(Y ~ X, data.frame( X = X, Y = Y),
family="symmetric", degree=2, span=0.1, weights= xy.density$Z^3)
lo.pred = predict(my.loess, data.frame(X = seq(MINx, MAXx,
length=100)), se=TRUE)
plot( seq(MINx, MAXx, length=100), lo.pred$fit, lwd=2,col=2, "l")
#, ylim=c(0, max(tmp$fit, na.rm=TRUE) ) , col="dark grey")
points(X,Y, pch=".", col= grey(abs(my.loess$res)/max(abs(my.loess$res))) )
On 10 March 2012 18:30, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> Hi,
>
> I posted a message earlier entitled "How to fit a line through the
> "Mountain crest" ..."
>
> I figured loess is probably the best way, but it seems that the
> problem is the robustness of the fit. Below I paste an example to
> illustrate the problem:
>
> tmp=rnorm(2000)
> X.background = 5+tmp; Y.background = 5+ (10*tmp+rnorm(2000))
> X.specific = 3.5+3*runif(1000); Y.specific = 5+120*runif(1000)
> X = c(X.background, X.specific);Y = c(Y.background, Y.specific)
> MINx=range(X)[1];MAXx=range(X)[2]
>
> my.loess = loess(Y ~ X, data.frame( X = X, Y = Y),
> family="symmetric", degree=2, span=0.1)
> lo.pred = predict(my.loess, data.frame(X = seq(MINx, MAXx,
> length=100)), se=TRUE)
> plot( seq(MINx, MAXx, length=100), lo.pred$fit, lwd=2,col=2, "l")
> points(X,Y, col= grey(abs(my.loess$res)/max(abs(my.loess$res))) )
>
> As you will see, the red line does not follow the "background" signal.
> However, when decreasing the "specific" signal to 500 points it
> becomes perfect.
>
> I'm sure there is a way to "tune" the fitting so that it works but I
> can't figure out how. Importantly, *I cannot increase the span*
> because in reality the relationship I'm looking at is more complex so
> I need a small span value to allow for a close fit.
>
> I foresee that changing the "weigthing" is the way to go but I do not
> really understand how the "weight" option is used (I tried to change
> it and nothing happened), and also the embedded "tricubic weighting"
> does not seem changeable.
>
> So any idea would be very welcome.
>
> Emmanuel
More information about the R-help
mailing list