[R] mgcv (bam) very large standard error difference between versions 1.7-11 and 1.7-17, bug?

Martijn Wieling wieling at gmail.com
Wed Jun 6 09:09:37 CEST 2012


Dear useRs,

I ran an additional analysis with another dataset and (logistic
regression), but also in this case I obtain very different p-values
comparing version 1.7-11 and 1.7-17. Any ideas about this? Which is
the good one?

Below, I'll first show the model based on 1.7-11, then of version
1.7-17: summary(model) followed by summary(model, freq=F). Data can be
sent offline if required.

With kind regards,
Martijn Wieling,
University of Groningen

# call in all versions
model = bam(NoStandard3 ~
te(GeoX,GeoY,WordFreqLog.z,by=OldSpeakerContrast,d=c(2,1)) +
OldSpeakerContrast + PopCntLog.z + s(Word,bs="re") +
s(Placename,bs="re") + s(Word,PopAvgIncomeLog.z,bs="re") +
s(Word,PopAvgAgeLog_residIncome.z,bs="re") +
s(Word,PopCntLog.z,bs="re") +
s(Word,YearRec.z,bs="re"),data=lexdst,family="binomial",method="REML")

### mgcv, version 1.7-11
#Family: binomial
#Link function: logit
#
#Formula:
#NoStandard3 ~ te(GeoX, GeoY, WordFreqLog.z, by = OldSpeakerContrast,
#    d = c(2, 1)) + OldSpeakerContrast + PopCntLog.z + s(Word,
#    bs = "re") + s(Placename, bs = "re") + s(Word, PopAvgIncomeLog.z,
#    bs = "re") + s(Word, PopAvgAgeLog_residIncome.z, bs = "re") +
#    s(Word, PopCntLog.z, bs = "re") + s(Word, YearRec.z, bs = "re")
#
#Parametric coefficients:
#                    Estimate Std. Error z value Pr(>|z|)
#(Intercept)         -0.41610    0.14027  -2.966  0.00301 **
#OldSpeakerContrast1  0.44508    0.01934  23.009  < 2e-16 ***
#PopCntLog.z         -0.11673    0.02725  -4.284 1.83e-05 ***
#---
#Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
#
#Approximate significance of smooth terms:
#                                                   edf Ref.df  Chi.sq  p-value
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast0  54.05  69.06   223.3  < 2e-16
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast1  59.17  74.83   327.6  < 2e-16
#s(Word)                                         166.41 167.84 13756.0  < 2e-16
#s(Placename)                                    157.64 188.02   692.7  < 2e-16
#s(Word,PopAvgIncomeLog.z)                       137.83 160.05   829.9  < 2e-16
#s(Word,PopAvgAgeLog_residIncome.z)              121.13 151.13   435.5  < 2e-16
#s(Word,PopCntLog.z)                              97.12 133.97   205.5 7.01e-05
#s(Word,YearRec.z)                               128.98 155.27   585.7  < 2e-16
#
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast0 ***
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast1 ***
#s(Word)                                         ***
#s(Placename)                                    ***
#s(Word,PopAvgIncomeLog.z)                       ***
#s(Word,PopAvgAgeLog_residIncome.z)              ***
#s(Word,PopCntLog.z)                             ***
#s(Word,YearRec.z)                               ***
#---
#Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
#
#R-sq.(adj) =  0.372   Deviance explained = 32.4%
#REML score =  97450  Scale est. = 1         n = 69259


### version 1.7-17, summary(model)
#Family: binomial
#Link function: logit
#
#Formula:
#NoStandard3 ~ te(GeoX, GeoY, WordFreqLog.z, by = OldSpeakerContrast,
#    d = c(2, 1)) + OldSpeakerContrast + PopCntLog.z + s(Word,
#    bs = "re") + s(Placename, bs = "re") + s(Word, PopAvgIncomeLog.z,
#    bs = "re") + s(Word, PopAvgAgeLog_residIncome.z, bs = "re") +
#    s(Word, PopCntLog.z, bs = "re") + s(Word, YearRec.z, bs = "re")
#
#Parametric coefficients:
#                    Estimate Std. Error z value Pr(>|z|)
#(Intercept)         -0.98150    0.16742  -5.863 4.56e-09 ***
#OldSpeakerContrast1  0.65497    0.30268   2.164   0.0305 *
#PopCntLog.z         -0.11724    0.01064 -11.016  < 2e-16 ***
#---
#Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
#
#Approximate significance of smooth terms:
#                                                   edf Ref.df  Chi.sq p-value
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast0  53.62  68.48   221.4  <2e-16
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast1  58.64  74.14   324.9  <2e-16
#s(Word)                                         166.41 168.00 14351.7  <2e-16
#s(Placename)                                    157.79 209.00  1160.0  <2e-16
#s(Word,PopAvgIncomeLog.z)                       137.84 170.00  1062.1  <2e-16
#s(Word,PopAvgAgeLog_residIncome.z)              121.15 170.00   686.7  <2e-16
#s(Word,PopCntLog.z)                              97.10 169.00   434.0  <2e-16
#s(Word,YearRec.z)                               128.98 170.00   867.4  <2e-16
#
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast0 ***
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast1 ***
#s(Word)                                         ***
#s(Placename)                                    ***
#s(Word,PopAvgIncomeLog.z)                       ***
#s(Word,PopAvgAgeLog_residIncome.z)              ***
#s(Word,PopCntLog.z)                             ***
#s(Word,YearRec.z)                               ***
#---
#Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
#
#R-sq.(adj) =  0.372   Deviance explained = 32.4%
#REML score =  97449  Scale est. = 1         n = 69259


### mgcv 1.7-17, summary(model,freq=F)
#Family: binomial
#Link function: logit
#
#Formula:
#NoStandard3 ~ te(GeoX, GeoY, WordFreqLog.z, by = OldSpeakerContrast,
#    d = c(2, 1)) + OldSpeakerContrast + PopCntLog.z + s(Word,
#    bs = "re") + s(Placename, bs = "re") + s(Word, PopAvgIncomeLog.z,
#    bs = "re") + s(Word, PopAvgAgeLog_residIncome.z, bs = "re") +
#    s(Word, PopCntLog.z, bs = "re") + s(Word, YearRec.z, bs = "re")
#
#Parametric coefficients:
#                    Estimate Std. Error z value Pr(>|z|)
#(Intercept)         -0.98150    0.52485  -1.870   0.0615 .
#OldSpeakerContrast1  0.65497    0.68628   0.954   0.3399
#PopCntLog.z         -0.11724    0.02726  -4.300 1.71e-05 ***
#---
#Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
#
#Approximate significance of smooth terms:
#                                                   edf Ref.df  Chi.sq p-value
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast0  53.62  68.48   221.4  <2e-16
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast1  58.64  74.14   324.9  <2e-16
#s(Word)                                         166.41 168.00 14351.7  <2e-16
#s(Placename)                                    157.79 209.00  1160.0  <2e-16
#s(Word,PopAvgIncomeLog.z)                       137.84 170.00  1062.1  <2e-16
#s(Word,PopAvgAgeLog_residIncome.z)              121.15 170.00   686.7  <2e-16
#s(Word,PopCntLog.z)                              97.10 169.00   434.0  <2e-16
#s(Word,YearRec.z)                               128.98 170.00   867.4  <2e-16
#
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast0 ***
#te(GeoX,GeoY,WordFreqLog.z):OldSpeakerContrast1 ***
#s(Word)                                         ***
#s(Placename)                                    ***
#s(Word,PopAvgIncomeLog.z)                       ***
#s(Word,PopAvgAgeLog_residIncome.z)              ***
#s(Word,PopCntLog.z)                             ***
#s(Word,YearRec.z)                               ***
#---
#Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
#
#R-sq.(adj) =  0.372   Deviance explained = 32.4%
#REML score =  97449  Scale est. = 1         n = 69259




On Tue, Jun 5, 2012 at 11:06 AM, Martijn Wieling <wieling at gmail.com> wrote:
> Dear useRs, Simon,
>
> @Simon: I'll send the data offline.
> The command summary(...,freq=F) yields the same result as before. Why
> did the default change from freq=F in version 1.7-13 to freq=T in
> version 1.7-17? Especially since the original default appeared to be
> better.
>
> I noticed that some of my commands were slightly off in the R-example
> of my previous e-mail (cut-paste errors...). They should be:
>
> modelLMER <- lmer(RefPMIdistMeanLog.c ~ SpYearBirth.z*IsAragon +
> SpIsMale + (1|Key), data=wrddst)
> print(modelLMER,cor=F)
>
> # using version 1.7-13, default = "REML"
> modelBAMold <- bam(RefPMIdistMeanLog.c ~ SpYearBirth.z*IsAragon +
> SpIsMale + s(Key,bs="re"), data=wrddst)
> summary(modelBAMold)
>
> # using version 1.7-17, explicitly stating method="REML"
> modelBAMnew <- bam(RefPMIdistMeanLog.c ~ SpYearBirth.z*IsAragon +
> SpIsMale + s(Key,bs="re"), data=wrddst, method="REML")
> summary(modelBAMnew)
> summary(modelBAMnew,freq=F) # same results as modelBAMold
>
> # using version 1.7-17, default: fREML - contains a bug
> modelBAMfREML <- bam(RefPMIdistMeanLog.c ~ SpYearBirth.z*IsAragon +
> SpIsMale + s(Key,bs="re"), data=wrddst)
> summary(modelBAMfREML)
>
> With kind regards,
> Martijn Wieling
>
> --
> *******************************************
> Martijn Wieling
> http://www.martijnwieling.nl
> wieling at gmail.com
> +31(0)614108622
> *******************************************
> University of Groningen
> http://www.rug.nl/staff/m.b.wieling
> *******************************************
>
>
> On Sun, Jun 3, 2012 at 7:45 PM, Martijn Wieling <wieling at gmail.com> wrote:
>> Dear useRs,
>>
>> I've ran some additional analyses (see below), which strongly suggest
>> the standard errors of the bam (and gam) function are much too low in
>> mgcv version 1.7-17, at least when including an s(X,bs="re") term.
>> Until this issue has been clarified, it's perhaps best to use an older
>> version of mgcv (unfortunately, however, in earlier versions the
>> p-value calculation of s(X,bs="re") is not correct). All analyses were
>> conducted in R 2.15.0.
>>
>> My approach was the following: I created a mixed-effects regression
>> model with a single random intercept and only linear predictors. In my
>> view, the results using lmer (lme4) should be comparable to those of
>> bam and gam (mgcv). This was the case when using an older version of
>> mgcv (version 1.7-13), but this is not the case anymore in version
>> 1.7-17. In version 1.7-17, the standard errors and p-values are much
>> lower and very similar to those of a linear model (which does not take
>> the random-effects structure into account). The R-code and results are
>> shown below. (The results using gam are not shown, but show the same
>> pattern.)
>>
>> Furthermore, note that the differences in standard errors become less
>> severe (but still noticeable) when less data is involved (e.g., using
>> only 500 rows as opposed to >100.000). Finally, when not including an
>> s(X,bs="re") term, but another non-random-effect smooth, the standard
>> errors do not appear to be structurally lower (only for some
>> variables, but not by a great deal - see also below).
>>
>> With kind regards,
>> Martijn Wieling
>> University of Groningen
>>
>> #### lme4 model (most recent version of lme4)
>> modelLMER <- lmer(RefPMIdistMeanLog.c ~ SpYearBirth.z*IsAragon +
>> SpIsMale + (1|Key), data=wrddst)
>> #                        Estimate Std. Error t value
>> #SpYearBirth.z          -0.012084   0.004577  -2.640
>> #IsAragon                0.138959   0.010040  13.840
>> #SpIsMale               -0.003087   0.008290  -0.372
>> #SpYearBirth.z:IsAragon  0.015429   0.010159   1.519
>>
>>
>> #### mgcv 1.7-13, default (method = "REML") - almost identical to modelLMER
>> modelBAMold <- bam(RefPMIdistMeanLog.c ~ SpYearBirth.z*IsAragon +
>> SpIsMale + s(Key,bs="re"), data=wrddst)
>> #                        Estimate Std. Error t value Pr(>|t|)
>> #SpYearBirth.z          -0.012084   0.004578  -2.640  0.00829 **
>> #IsAragon                0.138959   0.010042  13.838  < 2e-16 ***
>> #SpIsMale               -0.003087   0.008292  -0.372  0.70968
>> #SpYearBirth.z:IsAragon  0.015429   0.010160   1.519  0.12886
>>
>>
>> #### mgcv 1.7-17, method = "REML" - standard errors greatly reduced
>> # (comparable to standard errors of LM without random intercept)
>> modelBAMnew <- bam(RefPMIdistMeanLog.c ~ SpYearBirth.z*IsAragon +
>> SpIsMale + s(Key,bs="re"), data=wrddst); print(testje,cor=F)
>> #                        Estimate Std. Error t value Pr(>|t|)
>> #SpYearBirth.z          -0.012084   0.001159 -10.428  < 2e-16 ***
>> #IsAragon                0.138959   0.002551  54.472  < 2e-16 ***
>> #SpIsMale               -0.003087   0.002098  -1.471    0.141
>> #SpYearBirth.z:IsAragon  0.015429   0.002587   5.965 2.45e-09 ***
>>
>> #### lm results, standard errors comparable to mgcv 1.7-17
>> modelLM <- lm(RefPMIdistMeanLog.c ~ SpYearBirth.z*IsAragon + SpIsMale,
>> data=wrddst)
>> #                        Estimate Std. Error t value Pr(>|t|)
>> #(Intercept)            -0.025779   0.001653 -15.595  < 2e-16 ***
>> #SpYearBirth.z          -0.011906   0.001182 -10.070  < 2e-16 ***
>> #IsAragon                0.139323   0.002603  53.531  < 2e-16 ***
>> #SpIsMale               -0.003076   0.002140  -1.437    0.151
>> #SpYearBirth.z:IsAragon  0.015252   0.002639   5.780 7.49e-09 ***
>>
>>
>> #### mgcv 1.7-17, default (method = "fREML") - completely different
>> from previous models
>> modelBAMfREML <- bam(RefPMIdistMeanLog.c ~ SpYearBirth.z*IsAragon +
>> SpIsMale + s(Key,bs="re"), data=wrddst); print(testje,cor=F)
>> #                        Estimate Std. Error t value Pr(>|t|)
>> #(Intercept)            -0.025391   0.106897  -0.238    0.812
>> #SpYearBirth.z          -0.012084   0.076300  -0.158    0.874
>> #IsAragon                0.138959   0.166697   0.834    0.405
>> #SpIsMale               -0.003087   0.138291  -0.022    0.982
>> #SpYearBirth.z:IsAragon  0.015429   0.168260   0.092    0.927
>> #
>> #Approximate significance of smooth terms:
>> #          edf Ref.df     F p-value
>> #s(Key) -38.95    310 15.67  <2e-16 ***
>>
>>
>> #### differences w.r.t. standard smooths
>> #### mgcv version 1.7-13
>> m2old <- bam(RefPMIdistMeanLog.c ~ s(GeoX,GeoY) +
>> SpYearBirth.z*IsAragon + SpIsMale, data=wrddst, method="REML")
>> ## RESULTS
>> #Family: gaussian
>> #Link function: identity
>> #
>> #Formula:
>> #RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + SpYearBirth.z * IsAragon +
>> #    SpIsMale
>> #
>> #Parametric coefficients:
>> #                        Estimate Std. Error t value Pr(>|t|)
>> #(Intercept)            -0.001386   0.004982  -0.278   0.7809
>> #SpYearBirth.z          -0.012950   0.001167 -11.097  < 2e-16 ***
>> #IsAragon                0.020532   0.023608   0.870   0.3845
>> #SpIsMale               -0.004788   0.002219  -2.158   0.0309 *
>> #SpYearBirth.z:IsAragon  0.015611   0.002600   6.005 1.92e-09 ***
>> #---
>> #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> #
>> #Approximate significance of smooth terms:
>> #               edf Ref.df     F p-value
>> #s(GeoX,GeoY) 27.11  28.14 126.2  <2e-16 ***
>> #---
>> #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> #
>> #R-sq.(adj) =  0.0555   Deviance explained = 5.58%
>> #REML score =  39232  Scale est. = 0.11734   n = 112608
>>
>>
>> #### mgcv version 1.7-17
>> m2new <- bam(RefPMIdistMeanLog.c ~ s(GeoX,GeoY) +
>> SpYearBirth.z*IsAragon + SpIsMale, data=wrddst, method="REML")
>> #Family: gaussian
>> #Link function: identity
>> #
>> #Formula:
>> #RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + SpYearBirth.z * IsAragon +
>> #    SpIsMale
>> #
>> #Parametric coefficients:
>> #                        Estimate Std. Error t value Pr(>|t|)
>> #(Intercept)            -0.001388   0.003938  -0.352   0.7245
>> #SpYearBirth.z          -0.012950   0.001167 -11.098  < 2e-16 ***
>> #IsAragon                0.020543   0.018055   1.138   0.2552
>> #SpIsMale               -0.004788   0.002215  -2.161   0.0307 *
>> #SpYearBirth.z:IsAragon  0.015611   0.002600   6.005 1.92e-09 ***
>> #---
>> #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> #
>> #Approximate significance of smooth terms:
>> #               edf Ref.df     F p-value
>> #s(GeoX,GeoY) 27.11  28.14 126.2  <2e-16 ***
>> #---
>> #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> #
>> #R-sq.(adj) =  0.0555   Deviance explained = 5.58%
>> #REML score =  39232  Scale est. = 0.11734   n = 112608
>>
>>
>> On Sat, Jun 2, 2012 at 6:25 PM, Martijn Wieling <wieling at gmail.com> wrote:
>>> Dear useRs,
>>>
>>> I reran an analysis with bam (mgcv, version 1.7-17) originally
>>> conducted using an older version of bam (mgcv, version 1.7-11) and
>>> this resulted in the same estimates, but much lower standard errors
>>> (in some cases 20 times as low) and lower p-values. This obviously
>>> results in a larger set of significant predictors. Is this result
>>> expected given the improvements in the new version? Or this a bug and
>>> are the p-values of bam in mgcv 1.7-17 too low? The summaries of both
>>> versions are shown below to enable a comparison.
>>>
>>> In addition, applying the default method="fREML" (mgcv version 1.7-17)
>>> on the same dataset yields only non-significant results, while all
>>> results are highly significant using method="REML". Furthermore, it
>>> also results in large negative (e.g., -8757) edf values linked to
>>> s(X,bs="RE") terms. Is this correct, or is this a bug? The summary of
>>> the model using method="fREML" is also shown below.
>>>
>>> I hope someone can shed some light on this.
>>>
>>> With kind regards,
>>> Martijn Wieling,
>>> University of Groningen
>>>
>>> #################################
>>> ### mgcv version 1.7-11
>>> #################################
>>>
>>> Family: gaussian
>>> Link function: identity
>>>
>>> Formula:
>>> RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + RefVratio.z + IsSemiwordOrDemonstrative +
>>>    RefSoundCnt.z + SpYearBirth.z * IsAragon + PopCntLog_residGeo.z +
>>>    s(Word, bs = "re") + s(Key, bs = "re")
>>>
>>> Parametric coefficients:
>>>                           Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)               -0.099757   0.020234  -4.930 8.23e-07 ***
>>> RefVratio.z                0.105705   0.013328   7.931 2.19e-15 ***
>>> IsSemiwordOrDemonstrative  0.289828   0.046413   6.245 4.27e-10 ***
>>> RefSoundCnt.z              0.119981   0.021202   5.659 1.53e-08 ***
>>> SpYearBirth.z             -0.011396   0.002407  -4.734 2.21e-06 ***
>>> IsAragon                   0.055678   0.033137   1.680  0.09291 .
>>> PopCntLog_residGeo.z      -0.006504   0.003279  -1.984  0.04731 *
>>> SpYearBirth.z:IsAragon     0.015871   0.005459   2.907  0.00365 **
>>> ---
>>> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>>
>>> Approximate significance of smooth terms:
>>>                edf Ref.df      F p-value
>>> s(GeoX,GeoY)  24.01  24.21  31.16  <2e-16 ***
>>> s(Word)      352.29 347.00 501.57  <2e-16 ***
>>> s(Key)       269.75 289.25  10.76  <2e-16 ***
>>> ---
>>> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>>
>>> R-sq.(adj) =  0.693   Deviance explained = 69.4%
>>> REML score = -22476  Scale est. = 0.038177  n = 112608
>>>
>>>
>>> #################################
>>> ### mgcv version 1.7-17, much lower p-values and standard errors than
>>> version 1.7-11
>>> #################################
>>>
>>> Family: gaussian
>>> Link function: identity
>>>
>>> Formula:
>>> RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + RefVratio.z + IsSemiwordOrDemonstrative +
>>>    RefSoundCnt.z + SpYearBirth.z * IsAragon + PopCntLog_residGeo.z +
>>>    s(Word, bs = "re") + s(Key, bs = "re")
>>>
>>> Parametric coefficients:
>>>                            Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)               -0.0997566  0.0014139 -70.552  < 2e-16 ***
>>> RefVratio.z                0.1057049  0.0006565 161.010  < 2e-16 ***
>>> IsSemiwordOrDemonstrative  0.2898285  0.0020388 142.155  < 2e-16 ***
>>> RefSoundCnt.z              0.1199813  0.0009381 127.905  < 2e-16 ***
>>> SpYearBirth.z             -0.0113956  0.0006508 -17.509  < 2e-16 ***
>>> IsAragon                   0.0556777  0.0057143   9.744  < 2e-16 ***
>>> PopCntLog_residGeo.z      -0.0065037  0.0007938  -8.193 2.58e-16 ***
>>> SpYearBirth.z:IsAragon     0.0158712  0.0014829  10.703  < 2e-16 ***
>>> ---
>>> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>>
>>> Approximate significance of smooth terms:
>>>                edf Ref.df       F p-value
>>> s(GeoX,GeoY)  24.01  24.21   31.15  <2e-16 ***
>>> s(Word)      352.29 347.00  587.26  <2e-16 ***
>>> s(Key)       269.75 313.00 4246.76  <2e-16 ***
>>> ---
>>> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>>
>>> R-sq.(adj) =  0.693   Deviance explained = 69.4%
>>> REML score = -22476  Scale est. = 0.038177  n = 112608
>>>
>>>
>>> #################################
>>> ### mgcv version 1.7-17, default: method="fREML", all p-values
>>> non-significant and negative edf's of s(X,bs="re")
>>> #################################
>>>
>>> Family: gaussian
>>> Link function: identity
>>>
>>> Formula:
>>> RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + RefVratio.z + IsSemiwordOrDemonstrative +
>>>    RefSoundCnt.z + SpYearBirth.z * IsAragon + PopCntLog_residGeo.z +
>>>    s(Word, bs = "re") + s(Key, bs = "re")
>>>
>>> Parametric coefficients:
>>>                           Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)               -0.099757   1.730235  -0.058    0.954
>>> RefVratio.z                0.105705   1.145329   0.092    0.926
>>> IsSemiwordOrDemonstrative  0.289828   4.167237   0.070    0.945
>>> RefSoundCnt.z              0.119981   1.901158   0.063    0.950
>>> SpYearBirth.z             -0.011396   0.034236  -0.333    0.739
>>> IsAragon                   0.055678   0.298629   0.186    0.852
>>> PopCntLog_residGeo.z      -0.006504   0.041981  -0.155    0.877
>>> SpYearBirth.z:IsAragon     0.015871   0.077142   0.206    0.837
>>>
>>> Approximate significance of smooth terms:
>>>               edf Ref.df       F p-value
>>> s(GeoX,GeoY) -1376      1   7.823 0.00516 **
>>> s(Word)      -8298    347 577.910 < 2e-16 ***
>>> s(Key)       -1421    316  13.512 < 2e-16 ***
>>> ---
>>> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>>
>>> R-sq.(adj) =  0.741   Deviance explained = 69.4%
>>> fREML score = -22476  Scale est. = 0.038177  n = 112608



More information about the R-help mailing list