[R] Automatic Knot selection in Piecewise linear splines

Anupam Tyagi @nupty@g| @end|ng |rom gm@||@com
Tue Jul 16 12:43:26 CEST 2024


Thanks, Martin. This is very helpful.



On Tue, 16 Jul 2024 at 14:52, Martin Maechler <maechler using stat.math.ethz.ch>
wrote:

> >>>>> Anupam Tyagi
> >>>>>     on Tue, 9 Jul 2024 16:16:43 +0530 writes:
>
>     > How can I do automatic knot selection while fitting piecewise linear
>     > splines to two variables x and y? Which package to use to do it
> simply? I
>     > also want to visualize the splines (and the scatter plot) with a
> graph.
>
>     > Anupam
>
> NB: linear splines, i.e. piecewise linear continuous functions.
> Given the knots, use  approx() or approxfun() however, the
> automatic knots selection does not happen in the base R packages.
>
> I'm sure there are several R packages doing this.
> The best such package in my opinion is "earth" which does a
> re-implementation (and extensive  *generalization*) of the
> famous  MARS algorithm of Friedman.
> ==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
>
> Note that their strengths and power is that  they do their work
> for multivariate x (MARS := Multivariate Adaptive Regression
> Splines), but indeed do work for the simple 1D case.
>
> In the following example, we always get 11 final knots,
> but I'm sure one can tweak the many tuning paramters of earth()
> to get more:
>
> ## Can we do  knot-selection  for simple (x,y) splines?  === Yes, via
> earth() {using MARS}!
>
> x <- (0:800)/8
>
> f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
> curve(f(x), 0, 100, n = 1000, col=2, lwd=2)
>
> set.seed(11)
> y <- f(x) + 10*rnorm(x)
>
> m.sspl <- smooth.spline(x,y) # base line "standard smoother"
>
> require(earth)
> fm1 <- earth(x, y) # default settings
> summary(fm1, style = "pmax") #-- got  10 knots (x = 44 "used twice") below
> ## Call: earth(x=x, y=y)
>
> ## y =
> ##   175.9612
> ##   -   10.6744 * pmax(0,      x -  4.625)
> ##   +  9.928496 * pmax(0,      x - 10.875)
> ##   -  5.940857 * pmax(0,      x -  20.25)
> ##   +  3.438948 * pmax(0,      x - 27.125)
> ##   -  3.828159 * pmax(0,     44 -      x)
> ##   +  4.207046 * pmax(0,      x -     44)
> ##   +  2.573822 * pmax(0,      x -   76.5)
> ##   -  10.99073 * pmax(0,      x - 87.125)
> ##   +  10.97592 * pmax(0,      x - 90.875)
> ##   +  9.331949 * pmax(0,      x -     94)
> ##   -   8.48575 * pmax(0,      x -   96.5)
>
> ## Selected 12 of 12 terms, and 1 of 1 predictors
> ## Termination condition: Reached nk 21
> ## Importance: x
> ## Number of terms at each degree of interaction: 1 11 (additive model)
> ## GCV 108.6592    RSS 82109.44    GRSq 0.861423    RSq 0.86894
>
>
> fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
> summary(fm2)
> all.equal(fm1, fm2)# they are identical (apart from 'call'):
> fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive
> forward pass; *no* pruning
> ## still no change: fm3 "==" fm1
> all.equal(predict(fm1, xx), predict(fm3, xx))
>
> ## BTW: The chosen knots and coefficients are
> mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))
>
> ## Plots : fine grid for visualization: instead of   xx <- seq(x[1],
> x[length(x)], length.out = 1024)
> rnx <- extendrange(x) ## to extrapolate a bit
> xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))
>
> cbind(f = f(xx),
>       sspl = predict(m.sspl, xx)$y,
>       mars = predict(fm1, xx)) -> fits
>
> plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
> cols <- c(adjustcolor(2, 1/3),
>           adjustcolor(4, 2/3),
>           adjustcolor("orange4", 2/3))
> lwds <- c(3, 2, 2)
> matlines(xx, fits, col = cols, lwd = lwds, lty=1)
> legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
>        col=cols, lwd=lwds, bty = "n")
> title(paste("earth() linear spline vs. smooth.spline();  n =", length(x)))
> mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
>


-- 
Anupam.

	[[alternative HTML version deleted]]



More information about the R-help mailing list