[R] multiple comparisons for GAMs

Tue Aug 7 22:20:04 CEST 2012

On Aug 7, 2012, at 18:48 , Bird_Girl wrote:

> I have looked into using glht (‘multcomp’ package) to do multiple comparisons
> for a model fit with GAM {mgcv} but after reading the description of the
> ‘multcomp’ package, I believe this method only applies to parametric models
> and linear hypotheses.  When I ran the code glht(model,linfct….) I got an
> error message (see below), but I am not sure if that was the result of my
> data being input in the incorrect format or the result of the test not
> working on the type of model I used.  Does anyone know if there is an
> equivalent procedure for non-parametric models, or one that can be used to
> test for significant differences in curves modeled using GAMs? 
> 
> I have included a highly simplified, mini dataset that approximates the
> shape of the relationships exhibited by my own much larger data set, as well
> as the code I used.  In a nutshell, I would like to test whether the shape
> of the relationship between light and log_abundance at a height of 200m is
> significantly different from the relationship between light and
> log_abundance at a height of 400m.
> 
> 
>> require (lattice)
>> require (mgcv)
>> require(multcomp)
>> species=read.csv("Book1.csv", header=TRUE, sep=",", quote="\"",fill=TRUE)
>> attach(species)
>> names(species)
>> species
>  height light log_abundance
> 1    200m  0.15          0.20
> 2    200m  0.23          0.28
> 3    200m  0.38          0.30
> 4    200m  0.41          0.47
> 5    200m  0.52          0.48
> 6    200m  0.63          0.42
> 7    200m  0.71          0.37
> 8    200m  0.85          0.35
> 9    200m  0.96          0.40
> 10   200m  1.05          0.45
> 11   200m  1.16          0.37
> 12   200m  1.23          0.30
> 13   200m  1.39          0.26
> 14   200m  1.47          0.15
> 15   400m  0.12          0.10
> 16   400m  0.25          0.09
> 17   400m  0.36          0.12
> 18   400m  0.42          0.14
> 19   400m  0.60          0.24
> 20   400m  0.65          0.28
> 21   400m  0.74          0.37
> 22   400m  0.86          0.40
> 23   400m  0.93          0.35
> 24   400m  1.06          0.25
> 25   400m  1.15          0.15
> 26   400m  1.24          0.18
> 27   400m  1.37          0.40
> 28   400m  1.48          0.57
> 
>> coplot(log_abundance~light|height)
>> model1=gam(log_abundance~s(light)+height+te(light,by=height,k=6))
>> plot(model1,trans=function(x)exp(x)/(1+exp(x)),shade=T,pages=1)
>> summary (model1)
>> glht(model1,linfct=mcp(height="Tukey"))
> 
> Error in linfct[[nm]] %*% C : 
>  requires numeric/complex matrix/vector arguments

Thanks for the example. The direct cause of the error is that model.matrix.gam doesn't return the contrasts that you need for the symbolic trickery of mcp(height="Tukey") so the C term in the above is NULL. 

You can do it by hand:

> L <- matrix(c(0,1,rep(0,19)),nrow=1)
> summary(glht(model1,linfct=L))

	 Simultaneous Tests for General Linear Hypotheses

Fit: gam(formula = log_abundance ~ s(light) + height + te(light, by = height, 
    k = 6))

Linear Hypotheses:
       Estimate Std. Error z value Pr(>|z|)    
1 == 0 -0.08557    0.01622  -5.275 1.33e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
(Adjusted p values reported -- single-step method)

Now, I suspect that wasn't actually what you wanted... It sounds rather like you want to test the significance of the te() part. For two levels, I suppose this can be done with

> model2 <- update(model1, ~ s(light) + height)
> anova(model1,model2, test="F")
Analysis of Deviance Table

Model 1: log_abundance ~ s(light) + height + te(light, by = height, k = 6)
Model 2: log_abundance ~ s(light) + height
  Resid. Df Resid. Dev      Df Deviance      F    Pr(>F)    
1    17.123   0.031427                                      
2    23.624   0.282765 -6.5011 -0.25134 21.064 3.885e-07 ***

(These F tests with smooth terms are a bit elusive. GAM experts please step up and tell me if I'm off my rocker here!) 

However, the multiple comparisons come in when you want to do this with pairs of seven height levels, and that's not something that falls within the glht() framework. The best idea that I can come up with is to generate the 21 comparisons "manually" (e.g. by using version of "height" in which pairs of levels have been fused in the te() term) and then apply a generic p-value adjustment to the pairwise comparisons).

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com