[R] testing non-linear component in mgcv:gam

Denis Chabot chabotd at globetrotter.net
Wed Oct 5 22:32:36 CEST 2005


Thank you everyone for your help, but my introduction to GAM is  
turning my brain to mush. I thought the one part of the output I  
understood the best was r-sq (adj), but now even this is becoming foggy.

In my original message I mentioned a gam fit that turns out to be a  
linear fit. By curiosity I analysed it with a linear predictor only  
with mgcv package, and then as a linear model. The output was  
identical in both, but the r-sq (adj) was 0.55 in mgcv and 0.26 in  
lm. In lm I hope that my interpretation that 26% of the variance in y  
is explained by the linear relationship with x is valid. Then what  
does r2 mean in mgcv?

Denis
 > summary.gam(lin)

Family: gaussian
Link function: identity

Formula:
wm.sed ~ Temp

Parametric coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
Temp        -0.023792   0.006369  -3.736 0.000666 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


R-sq.(adj) =  0.554   Deviance explained = 28.5%
GCV score = 0.09904   Scale est. = 0.093686  n = 37


 > summary(sed.true.lin)

Call:
lm(formula = wm.sed ~ Temp, weights = N.sets)

Residuals:
     Min      1Q  Median      3Q     Max
-0.6138 -0.1312 -0.0325  0.1089  1.1449

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
Temp        -0.023792   0.006369  -3.736 0.000666 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3061 on 35 degrees of freedom
Multiple R-Squared: 0.285,    Adjusted R-squared: 0.2646
F-statistic: 13.95 on 1 and 35 DF,  p-value: 0.000666


Le 05-10-05 à 09:45, John Fox a écrit :

> Dear Denis,
>
> Take a closer look at the anova table: The models provide identical  
> fits to
> the data. The differences in degrees of freedom and deviance  
> between the two
> models are essentially zero, 5.5554e-10 and 2.353e-11 respectively.
>
> I hope this helps,
>  John
>
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Denis Chabot
>> Sent: Wednesday, October 05, 2005 8:22 AM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] testing non-linear component in mgcv:gam
>>
>> Hi,
>>
>> I need further help with my GAMs. Most models I test are very
>> obviously non-linear. Yet, to be on the safe side, I report
>> the significance of the smooth (default output of mgcv's
>> summary.gam) and confirm it deviates significantly from linearity.
>>
>> I do the latter by fitting a second model where the same
>> predictor is entered without the s(), and then use anova.gam
>> to compare the two. I thought this was the equivalent of the
>> default output of anova.gam using package gam instead of mgcv.
>>
>> I wonder if this procedure is correct because one of my
>> models appears to be linear. In fact mgcv estimates df to be
>> exactly 1.0 so I could have stopped there. However I
>> inadvertently repeated the procedure outlined above. I would
>> have thought in this case the anova.gam comparing the smooth
>> and the linear fit would for sure have been not significant.
>> To my surprise, P was 6.18e-09!
>>
>> Am I doing something wrong when I attempt to confirm the non-
>> parametric part a smoother is significant? Here is my example
>> case where the relationship does appear to be linear:
>>
>> library(mgcv)
>>
>>> This is mgcv 1.3-7
>>>
>> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
>> 0.38, 0.62, 0.88, 1.12,
>>             1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12,
>> 3.38, 3.62, 3.88,
>>             4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88,
>> 6.12, 6.38, 6.62, 6.88,
>>             7.12, 8.38, 13.62)
>> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27,
>> 29, 31, 22, 26, 24, 23,
>>              15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3,
>> 1, 1, 1, 1, 1) wm.sed <- c(0.000000000, 0.016129032,
>> 0.000000000, 0.062046512, 0.396459596, 0.189082949,
>>              0.054757925, 0.142810440, 0.168005168,
>> 0.180804428, 0.111439628, 0.128799505,
>>              0.193707937, 0.105921610, 0.103497845,
>> 0.028591837, 0.217894389, 0.020535469,
>>              0.080389068, 0.105234450, 0.070213450,
>> 0.050771363, 0.042074434, 0.102348837,
>>              0.049748344, 0.019100478, 0.005203125,
>> 0.101711864, 0.000000000, 0.000000000,
>>              0.014808824, 0.000000000, 0.222000000,
>> 0.167000000, 0.000000000, 0.000000000,
>>              0.000000000)
>>
>> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
>> summary.gam(sed.gam)
>>
>>> Family: gaussian
>>> Link function: identity
>>>
>>> Formula:
>>> wm.sed ~ s(Temp)
>>>
>>> Parametric coefficients:
>>>             Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)  0.08403    0.01347   6.241 3.73e-07 ***
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>
>>> Approximate significance of smooth terms:
>>>         edf Est.rank     F  p-value
>>> s(Temp)   1        1 13.95 0.000666 ***
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>
>>> R-sq.(adj) =  0.554   Deviance explained = 28.5%
>>> GCV score = 0.09904   Scale est. = 0.093686  n = 37
>>>
>>
>> # testing non-linear contribution
>> sed.lin <- gam(wm.sed~Temp,weight=N.sets)
>> summary.gam(sed.lin)
>>
>>> Family: gaussian
>>> Link function: identity
>>>
>>> Formula:
>>> wm.sed ~ Temp
>>>
>>> Parametric coefficients:
>>>              Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
>>> Temp        -0.023792   0.006369  -3.736 0.000666 ***
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>
>>>
>>> R-sq.(adj) =  0.554   Deviance explained = 28.5%
>>> GCV score = 0.09904   Scale est. = 0.093686  n = 37
>>>
>> anova.gam(sed.lin, sed.gam, test="F")
>>
>>> Analysis of Deviance Table
>>>
>>> Model 1: wm.sed ~ Temp
>>> Model 2: wm.sed ~ s(Temp)
>>>    Resid. Df Resid. Dev         Df  Deviance      F   Pr(>F)
>>> 1 3.5000e+01      3.279
>>> 2 3.5000e+01      3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
>>>
>>
>>
>> Thanks in advance,
>>
>>
>> Denis Chabot
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
>




More information about the R-help mailing list