[R] mgcv (bam) very large standard error difference between versions 1.7-11 and 1.7-17, bug?

Simon Wood s.wood at bath.ac.uk
Mon Jun 11 13:36:43 CEST 2012


Hi Martijn,

The negative edf from fREML was a matrix indexing bug triggered by the 
"re" term (other fREML discrepancies followed from this). Fixed for 
1.7-18. Thanks for reporting this, and the supporting offline info!

best,
Simon

On 02/06/12 23:17, Simon Wood wrote:
> the fREML results must be a bug. I'd modified the bam covariance and edf
> code in 1.7-17 and it looks like I must have messed something up. Any
> chance you could send me the data off line?
>
> Also can you try summary(...,freq=FALSE) under 1.7-17 and see if you get
> the same results as before?
>
> On 06/02/2012 05:25 PM, Martijn Wieling wrote:
>> Dear useRs,
>>
>> I reran an analysis with bam (mgcv, version 1.7-17) originally
>> conducted using an older version of bam (mgcv, version 1.7-11) and
>> this resulted in the same estimates, but much lower standard errors
>> (in some cases 20 times as low) and lower p-values. This obviously
>> results in a larger set of significant predictors. Is this result
>> expected given the improvements in the new version? Or this a bug and
>> are the p-values of bam in mgcv 1.7-17 too low? The summaries of both
>> versions are shown below to enable a comparison.
>>
>> In addition, applying the default method="fREML" (mgcv version 1.7-17)
>> on the same dataset yields only non-significant results, while all
>> results are highly significant using method="REML". Furthermore, it
>> also results in large negative (e.g., -8757) edf values linked to
>> s(X,bs="RE") terms. Is this correct, or is this a bug? The summary of
>> the model using method="fREML" is also shown below.
>>
>> I hope someone can shed some light on this.
>>
>> With kind regards,
>> Martijn Wieling,
>> University of Groningen
>>
>> #################################
>> ### mgcv version 1.7-11
>> #################################
>>
>> Family: gaussian
>> Link function: identity
>>
>> Formula:
>> RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + RefVratio.z +
>> IsSemiwordOrDemonstrative +
>> RefSoundCnt.z + SpYearBirth.z * IsAragon + PopCntLog_residGeo.z +
>> s(Word, bs = "re") + s(Key, bs = "re")
>>
>> Parametric coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.099757 0.020234 -4.930 8.23e-07 ***
>> RefVratio.z 0.105705 0.013328 7.931 2.19e-15 ***
>> IsSemiwordOrDemonstrative 0.289828 0.046413 6.245 4.27e-10 ***
>> RefSoundCnt.z 0.119981 0.021202 5.659 1.53e-08 ***
>> SpYearBirth.z -0.011396 0.002407 -4.734 2.21e-06 ***
>> IsAragon 0.055678 0.033137 1.680 0.09291 .
>> PopCntLog_residGeo.z -0.006504 0.003279 -1.984 0.04731 *
>> SpYearBirth.z:IsAragon 0.015871 0.005459 2.907 0.00365 **
>> ---
>> Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>
>> Approximate significance of smooth terms:
>> edf Ref.df F p-value
>> s(GeoX,GeoY) 24.01 24.21 31.16<2e-16 ***
>> s(Word) 352.29 347.00 501.57<2e-16 ***
>> s(Key) 269.75 289.25 10.76<2e-16 ***
>> ---
>> Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>
>> R-sq.(adj) = 0.693 Deviance explained = 69.4%
>> REML score = -22476 Scale est. = 0.038177 n = 112608
>>
>>
>> #################################
>> ### mgcv version 1.7-17, much lower p-values and standard errors than
>> version 1.7-11
>> #################################
>>
>> Family: gaussian
>> Link function: identity
>>
>> Formula:
>> RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + RefVratio.z +
>> IsSemiwordOrDemonstrative +
>> RefSoundCnt.z + SpYearBirth.z * IsAragon + PopCntLog_residGeo.z +
>> s(Word, bs = "re") + s(Key, bs = "re")
>>
>> Parametric coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.0997566 0.0014139 -70.552< 2e-16 ***
>> RefVratio.z 0.1057049 0.0006565 161.010< 2e-16 ***
>> IsSemiwordOrDemonstrative 0.2898285 0.0020388 142.155< 2e-16 ***
>> RefSoundCnt.z 0.1199813 0.0009381 127.905< 2e-16 ***
>> SpYearBirth.z -0.0113956 0.0006508 -17.509< 2e-16 ***
>> IsAragon 0.0556777 0.0057143 9.744< 2e-16 ***
>> PopCntLog_residGeo.z -0.0065037 0.0007938 -8.193 2.58e-16 ***
>> SpYearBirth.z:IsAragon 0.0158712 0.0014829 10.703< 2e-16 ***
>> ---
>> Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>
>> Approximate significance of smooth terms:
>> edf Ref.df F p-value
>> s(GeoX,GeoY) 24.01 24.21 31.15<2e-16 ***
>> s(Word) 352.29 347.00 587.26<2e-16 ***
>> s(Key) 269.75 313.00 4246.76<2e-16 ***
>> ---
>> Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>
>> R-sq.(adj) = 0.693 Deviance explained = 69.4%
>> REML score = -22476 Scale est. = 0.038177 n = 112608
>>
>>
>> #################################
>> ### mgcv version 1.7-17, default: method="fREML", all p-values
>> non-significant and negative edf's of s(X,bs="re")
>> #################################
>>
>> Family: gaussian
>> Link function: identity
>>
>> Formula:
>> RefPMIdistMeanLog.c ~ s(GeoX, GeoY) + RefVratio.z +
>> IsSemiwordOrDemonstrative +
>> RefSoundCnt.z + SpYearBirth.z * IsAragon + PopCntLog_residGeo.z +
>> s(Word, bs = "re") + s(Key, bs = "re")
>>
>> Parametric coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.099757 1.730235 -0.058 0.954
>> RefVratio.z 0.105705 1.145329 0.092 0.926
>> IsSemiwordOrDemonstrative 0.289828 4.167237 0.070 0.945
>> RefSoundCnt.z 0.119981 1.901158 0.063 0.950
>> SpYearBirth.z -0.011396 0.034236 -0.333 0.739
>> IsAragon 0.055678 0.298629 0.186 0.852
>> PopCntLog_residGeo.z -0.006504 0.041981 -0.155 0.877
>> SpYearBirth.z:IsAragon 0.015871 0.077142 0.206 0.837
>>
>> Approximate significance of smooth terms:
>> edf Ref.df F p-value
>> s(GeoX,GeoY) -1376 1 7.823 0.00516 **
>> s(Word) -8298 347 577.910< 2e-16 ***
>> s(Key) -1421 316 13.512< 2e-16 ***
>> ---
>> Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>
>> R-sq.(adj) = 0.741 Deviance explained = 69.4%
>> fREML score = -22476 Scale est. = 0.038177 n = 112608
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283



More information about the R-help mailing list