[R] mgcv::gam(): NA parametric coefficient in a model with two categorical variables + model interpretation

Mon May 23 15:32:06 CEST 2016

Q1: It looks like the model is not fully identifiably given the data and 
as a result igcCAT.ideo has been set to zero - there is no sensible test 
to conduct with such a term, hence the NAs in the test stat an p-value 
fields.

Q2: A separate (centred) smooth is estimated for each level of igc. If 
you want a baseline (igcCAT.pseudo) smooth, and difference smooths for 
the rest of the levels of igc then you need to set igc to be an ordered 
factor, and use something like...
~ igc + s(ctrial) + s(ctrial,by=igc)
- see section on `by' variables in ?gam.models.

best,
Simon

On 22/05/16 23:29, Fotis Fotiadis wrote:
> Hallo all
>
> I am using a gam model for my data.
>
> m2.4<-bam(acc~ 1 + igc + s(ctrial, by=igc) + shape + s(ctrial, by=shape) +
> s(ctrial, sbj, bs = "fs", m = 1) , data=data, family=binomial)
>
> igc codes condition and there are four levels (CAT.pseudo,
> CAT.ideo,PA.pseudo, PA.ideo), and shape is a factor (that cannot be
> considered random effect) with four levels too (rand21, rand22, rand23,
> rand30).
>
> Here is the summary of the model
>> summary(m2.4)
> Family: binomial
> Link function: logit
>
> Formula:
> acc ~ 1 + igc + s(ctrial, by = igc) + shape + s(ctrial, by = shape) +
>      s(ctrial, sbj, bs = "fs", m = 1)
>
> Parametric coefficients:
>               Estimate Std. Error z value Pr(>|z|)
> (Intercept)    3.5321     0.1930  18.302  < 2e-16 ***
> igcCAT.ideo    0.0000     0.0000      NA       NA
> igcPA.ideo    -0.3650     0.2441  -1.495   0.1348
> igcPA.pseudo  -0.2708     0.2574  -1.052   0.2928
> shaperand22   -0.1390     0.1548  -0.898   0.3693
> shaperand23    0.3046     0.1670   1.823   0.0682 .
> shaperand30   -0.5839     0.1163  -5.020 5.16e-07 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Approximate significance of smooth terms:
>                              edf  Ref.df   Chi.sq  p-value
> s(ctrial):igcCAT.pseudo   3.902   4.853   74.787 1.07e-14 ***
> s(ctrial):igcCAT.ideo     2.293   2.702   13.794 0.001750 **
> s(ctrial):igcPA.ideo      1.000   1.000   11.391 0.000738 ***
> s(ctrial):igcPA.pseudo    3.158   3.815   20.411 0.000413 ***
> s(ctrial):shaperand21     2.556   3.316   31.387 1.46e-06 ***
> s(ctrial):shaperand22     1.000   1.000    0.898 0.343381
> s(ctrial):shaperand23     2.304   2.850    6.144 0.118531
> s(ctrial):shaperand30     4.952   5.947   27.806 0.000144 ***
> s(ctrial,sbj)           221.476 574.000 1502.779  < 2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Rank: 652/655
> R-sq.(adj) =  0.405   Deviance explained = 43.9%
> fREML =  24003  Scale est. = 1         n = 18417
>
>
> I am not sure how this model works, but I guess it creates four smooths for
> each level of condition, and four smooths for each level of shape.
>
> There is also the intercept of the model, set at the reference level of
> condition (CAT.pseudo) and at the reference level of shape (rand21). Each
> parametric term represents the difference of each level of each of the two
> factors from the intercept.
>
> I have two questions
>
> Q1:
> Does anyone now why I get NA results in the second line of the parametric
> terms?
>
> Q2:
> The term igcCAT.ideo denotes the difference in the intercept between
> (A): condition=igcCAT.ideo,  and
> (B): (condition=igcCATpseudo ) &(shape=rand21).
> But what is the value (level) of shape for (A)?
> Is it the reference level? Or is it, perhaps, the "grand mean" of the shape
> variable?
>
>
> Thank you in advance for your time,
> Fotis
>
>

-- 
Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
+44 (0)117 33 18273     http://www.maths.bris.ac.uk/~sw15190