[R] Question about mars() -function

peter dalgaard pdalgd at gmail.com
Mon Dec 27 09:20:13 CET 2010


On Dec 26, 2010, at 17:54 , Tiina Hakanen wrote:

> Hi!
> 
> I have some questions about MARS model's coefficient of determination. I use the MARS method in my master's thesis and I have noticed some problems with
> the MARS model's R^2.
> 
> You can see the following example that the MARS model's R^2 is too big when i have used mars() -function for MARS model building, and when I have made MARS-model using a linear regression, it gives much smaller R^2.
> 
> So can you please tell me some information about why the MARS model R^2 is so big? How can I get the MARS model´s correct R^2 in R-projector some another way than in the following example or by calculating it myself using R^2-formula?

This isn't really to do with MARS as such. You have two equivalent linear models, one with and one without an intercept (i.e., the first column m$x1 is the constant 1). R computes the R^2 so that it is consistent with the overall F test, which you can see has three numerator DF in the marsmodel, but only two in the corresponding linear model. Put differently, the null model is zero in one case and a constant in the other. This sometimes catches people out, but without such a convention, no-intercept models could get negative R^2.

Pragmatically, if you are sure that the marsmodel will always contain the intercept-only model, does lm(data[,1]~m$x) not provide the desired R^2, with a warning that one parameter is aliased?

> 
> I hope you can reply soon.
> 
> Best regards,
> 
> Tiina Hakanen
> 
> 
> library(ElemStatLearn)
> library(mda)
> data<-ozone
> m<-mars(data[,-1], data[,1], nk=4)
> m$factor[m$s,]
> m$cuts[m$s,]
> m$coef
> marsmodel<-lm(data[,1]~m$x-1)
> summary(marsmodel)
> 
> Call:
> lm(formula = data[, 1] ~ m$x - 1)
> 
> Residuals:
>    Min      1Q  Median      3Q     Max
> -36.264 -15.993  -2.351   9.993 122.793
> 
> Coefficients:
>     Estimate Std. Error t value Pr(>|t|)
> m$x1  52.9783     3.8894  13.621  < 2e-16 ***
> m$x2   4.7383     0.9599   4.936 2.92e-06 ***
> m$x3  -1.9428     0.3084  -6.300 6.61e-09 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> 
> Residual standard error: 23.38 on 108 degrees of freedom
> Multiple R-squared: 0.8147,     Adjusted R-squared: 0.8095
> F-statistic: 158.2 on 3 and 108 DF,  p-value: < 2.2e-16
> 
> knot1 <- function (x,k) ifelse(x > k, x-k, 0)
> knot2 <- function(x, k) ifelse(x < k, k-x, 0)
> reg <- lm(ozone ~knot1(temperature,85)+knot2(temperature,85),data=data)
> 
> summary(reg)
> 
> Call:
> lm(formula = ozone ~ knot1(temperature, 85) + knot2(temperature,
>    85), data = data)
> 
> Residuals:
>    Min      1Q  Median      3Q     Max
> -36.264 -15.993  -2.351   9.993 122.793
> 
> Coefficients:
>                       Estimate Std. Error t value Pr(>|t|)
> (Intercept)             52.9783     3.8894  13.621  < 2e-16 ***
> knot1(temperature, 85)   4.7383     0.9599   4.936 2.92e-06 ***
> knot2(temperature, 85)  -1.9428     0.3084  -6.300 6.61e-09 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> 
> Residual standard error: 23.38 on 108 degrees of freedom
> Multiple R-squared: 0.5153,     Adjusted R-squared: 0.5064
> F-statistic: 57.42 on 2 and 108 DF,  p-value: < 2.2e-16
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list