[R] mgcv (bam) very large standard error difference between versions 1.7-11 and 1.7-17, bug?

Martijn Wieling wieling at gmail.com
Sat Jun 2 18:25:14 CEST 2012


Dear useRs,

I reran an analysis with bam (mgcv, version 1.7-17) originally
conducted using an older version of bam (mgcv, version 1.7-11) and
this resulted in the same estimates, but much lower standard errors
(in some cases 20 times as low) and lower p-values. This obviously
results in a larger set of significant predictors. Is this result
expected given the improvements in the new version? Or this a bug and
are the p-values of bam in mgcv 1.7-17 too low? The summaries of both
versions are shown below to enable a comparison.

In addition, applying the default method="fREML" (mgcv version 1.7-17)
on the same dataset yields only non-significant results, while all
results are highly significant using method="REML". Furthermore, it
also results in large negative (e.g., -8757) edf values linked to
s(X,bs="RE") terms. Is this correct, or is this a bug? The summary of
the model using method="fREML" is also shown below.

I hope someone can shed some light on this.

With kind regards,
Martijn Wieling,
University of Groningen

#################################
### mgcv version 1.7-11
#################################

Family: gaussian
Link function: identity

Formula:
RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + RefVratio.z + IsSemiwordOrDemonstrative +
    RefSoundCnt.z + SpYearBirth.z * IsAragon + PopCntLog_residGeo.z +
    s(Word, bs = "re") + s(Key, bs = "re")

Parametric coefficients:
                           Estimate Std. Error t value Pr(>|t|)
(Intercept)               -0.099757   0.020234  -4.930 8.23e-07 ***
RefVratio.z                0.105705   0.013328   7.931 2.19e-15 ***
IsSemiwordOrDemonstrative  0.289828   0.046413   6.245 4.27e-10 ***
RefSoundCnt.z              0.119981   0.021202   5.659 1.53e-08 ***
SpYearBirth.z             -0.011396   0.002407  -4.734 2.21e-06 ***
IsAragon                   0.055678   0.033137   1.680  0.09291 .
PopCntLog_residGeo.z      -0.006504   0.003279  -1.984  0.04731 *
SpYearBirth.z:IsAragon     0.015871   0.005459   2.907  0.00365 **
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

Approximate significance of smooth terms:
                edf Ref.df      F p-value
s(GeoX,GeoY)  24.01  24.21  31.16  <2e-16 ***
s(Word)      352.29 347.00 501.57  <2e-16 ***
s(Key)       269.75 289.25  10.76  <2e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

R-sq.(adj) =  0.693   Deviance explained = 69.4%
REML score = -22476  Scale est. = 0.038177  n = 112608


#################################
### mgcv version 1.7-17, much lower p-values and standard errors than
version 1.7-11
#################################

Family: gaussian
Link function: identity

Formula:
RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + RefVratio.z + IsSemiwordOrDemonstrative +
    RefSoundCnt.z + SpYearBirth.z * IsAragon + PopCntLog_residGeo.z +
    s(Word, bs = "re") + s(Key, bs = "re")

Parametric coefficients:
                            Estimate Std. Error t value Pr(>|t|)
(Intercept)               -0.0997566  0.0014139 -70.552  < 2e-16 ***
RefVratio.z                0.1057049  0.0006565 161.010  < 2e-16 ***
IsSemiwordOrDemonstrative  0.2898285  0.0020388 142.155  < 2e-16 ***
RefSoundCnt.z              0.1199813  0.0009381 127.905  < 2e-16 ***
SpYearBirth.z             -0.0113956  0.0006508 -17.509  < 2e-16 ***
IsAragon                   0.0556777  0.0057143   9.744  < 2e-16 ***
PopCntLog_residGeo.z      -0.0065037  0.0007938  -8.193 2.58e-16 ***
SpYearBirth.z:IsAragon     0.0158712  0.0014829  10.703  < 2e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

Approximate significance of smooth terms:
                edf Ref.df       F p-value
s(GeoX,GeoY)  24.01  24.21   31.15  <2e-16 ***
s(Word)      352.29 347.00  587.26  <2e-16 ***
s(Key)       269.75 313.00 4246.76  <2e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

R-sq.(adj) =  0.693   Deviance explained = 69.4%
REML score = -22476  Scale est. = 0.038177  n = 112608


#################################
### mgcv version 1.7-17, default: method="fREML", all p-values
non-significant and negative edf's of s(X,bs="re")
#################################

Family: gaussian
Link function: identity

Formula:
RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + RefVratio.z + IsSemiwordOrDemonstrative +
    RefSoundCnt.z + SpYearBirth.z * IsAragon + PopCntLog_residGeo.z +
    s(Word, bs = "re") + s(Key, bs = "re")

Parametric coefficients:
                           Estimate Std. Error t value Pr(>|t|)
(Intercept)               -0.099757   1.730235  -0.058    0.954
RefVratio.z                0.105705   1.145329   0.092    0.926
IsSemiwordOrDemonstrative  0.289828   4.167237   0.070    0.945
RefSoundCnt.z              0.119981   1.901158   0.063    0.950
SpYearBirth.z             -0.011396   0.034236  -0.333    0.739
IsAragon                   0.055678   0.298629   0.186    0.852
PopCntLog_residGeo.z      -0.006504   0.041981  -0.155    0.877
SpYearBirth.z:IsAragon     0.015871   0.077142   0.206    0.837

Approximate significance of smooth terms:
               edf Ref.df       F p-value
s(GeoX,GeoY) -1376      1   7.823 0.00516 **
s(Word)      -8298    347 577.910 < 2e-16 ***
s(Key)       -1421    316  13.512 < 2e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

R-sq.(adj) =  0.741   Deviance explained = 69.4%
fREML score = -22476  Scale est. = 0.038177  n = 112608



More information about the R-help mailing list