# [R] Question about mars() -function

peter dalgaard pdalgd at gmail.com
Mon Dec 27 09:20:13 CET 2010

```On Dec 26, 2010, at 17:54 , Tiina Hakanen wrote:

> Hi!
>
> I have some questions about MARS model's coefficient of determination. I use the MARS method in my master's thesis and I have noticed some problems with
> the MARS model's R^2.
>
> You can see the following example that the MARS model's R^2 is too big when i have used mars() -function for MARS model building, and when I have made MARS-model using a linear regression, it gives much smaller R^2.
>
> So can you please tell me some information about why the MARS model R^2 is so big? How can I get the MARS model´s correct R^2 in R-projector some another way than in the following example or by calculating it myself using R^2-formula?

This isn't really to do with MARS as such. You have two equivalent linear models, one with and one without an intercept (i.e., the first column m\$x1 is the constant 1). R computes the R^2 so that it is consistent with the overall F test, which you can see has three numerator DF in the marsmodel, but only two in the corresponding linear model. Put differently, the null model is zero in one case and a constant in the other. This sometimes catches people out, but without such a convention, no-intercept models could get negative R^2.

Pragmatically, if you are sure that the marsmodel will always contain the intercept-only model, does lm(data[,1]~m\$x) not provide the desired R^2, with a warning that one parameter is aliased?

>
> I hope you can reply soon.
>
> Best regards,
>
> Tiina Hakanen
>
>
> library(ElemStatLearn)
> library(mda)
> data<-ozone
> m<-mars(data[,-1], data[,1], nk=4)
> m\$factor[m\$s,]
> m\$cuts[m\$s,]
> m\$coef
> marsmodel<-lm(data[,1]~m\$x-1)
> summary(marsmodel)
>
> Call:
> lm(formula = data[, 1] ~ m\$x - 1)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -36.264 -15.993  -2.351   9.993 122.793
>
> Coefficients:
>     Estimate Std. Error t value Pr(>|t|)
> m\$x1  52.9783     3.8894  13.621  < 2e-16 ***
> m\$x2   4.7383     0.9599   4.936 2.92e-06 ***
> m\$x3  -1.9428     0.3084  -6.300 6.61e-09 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 23.38 on 108 degrees of freedom
> Multiple R-squared: 0.8147,     Adjusted R-squared: 0.8095
> F-statistic: 158.2 on 3 and 108 DF,  p-value: < 2.2e-16
>
> knot1 <- function (x,k) ifelse(x > k, x-k, 0)
> knot2 <- function(x, k) ifelse(x < k, k-x, 0)
> reg <- lm(ozone ~knot1(temperature,85)+knot2(temperature,85),data=data)
>
> summary(reg)
>
> Call:
> lm(formula = ozone ~ knot1(temperature, 85) + knot2(temperature,
>    85), data = data)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -36.264 -15.993  -2.351   9.993 122.793
>
> Coefficients:
>                       Estimate Std. Error t value Pr(>|t|)
> (Intercept)             52.9783     3.8894  13.621  < 2e-16 ***
> knot1(temperature, 85)   4.7383     0.9599   4.936 2.92e-06 ***
> knot2(temperature, 85)  -1.9428     0.3084  -6.300 6.61e-09 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 23.38 on 108 degrees of freedom
> Multiple R-squared: 0.5153,     Adjusted R-squared: 0.5064
> F-statistic: 57.42 on 2 and 108 DF,  p-value: < 2.2e-16
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help