[R] mgcv, should include a intercept for the 'by' varying coefficient model, which is unconstrained

Xing Zhao zhaoxing at uw.edu
Tue Mar 18 00:39:02 CET 2014


Dear Dr. Wood and other mgcv experts


In ?gam.models, it says that the numeric "by" variable is genrally not
subjected to an identifiability constraint, and I used the example in
?gam.models, finding some differences (code below).

I think the the problem might become serious when several varying
coefficient terms are specified in one model, such as gam(y ~
s(x0,by=x1) + s(x0,by=x2) + s(x0,by=x3),data=dat). In this case, those
three terms are all not constraint, as they generally will not meet
the three conditions for constraint.

I can still implement it like gam(y ~ s(x0,by=x1) + s(x0,by=x2) +
s(x0,by=x3),data=dat), but is it safe? Is there a best way to
implement the model?

Thank you for your help
Best,
Xing


require(mgcv)
set.seed(10)
## simulate date from y = f(x2)*x1 + error
dat <- gamSim(3,n=400)

b<-gam(y ~ s(x2,by=x1),data=dat)
b1<-gam(y ~ s(x2,by=x1)-1,data=dat)

> range(fitted(b)-fitted(b1))
[1] -0.13027648  0.08117196
> summary(dat$f-fitted(b))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.5265  0.2628  1.2290  1.7710  2.6280  8.8580
> summary(dat$f-fitted(b1))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.4618  0.2785  1.2250  1.7390  2.5370  8.7310
> summary(dat$y-fitted(b))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-6.23500 -1.32700 -0.06752  0.00000  1.54900  7.01800
> summary(dat$y-fitted(b1))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-6.26700 -1.40300 -0.09908 -0.03199  1.51900  6.96700



More information about the R-help mailing list