[R-sig-ME] gamm4 model formulation clarification

Mon Aug 2 19:44:24 CEST 2010

On Sat, Jul 31, 2010 at 9:42 PM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
> Also, I'm just now thinking that neither approach really gets at what
> I want to test; what I want to test whether the model is improved by
> (1) letting the smooth vary as a function of design, (2) letting the
> smooth vary as a function of ddB, and (3) letting the smooth vary as a
> function of both design & ddB. That is, is there a design*soa
> interaction, a ddB*soa interaction and/or a design*ddB*soa
> interaction? In the case of support for any of these interactions, I'd
> also be interested in pinpointing the timeframe over which the
> interactions take place.

After playing a bit, I wonder if anyone has any input on the following
"solution" I'm currently working with. First, a change in variable
names to simplify things: A & B are both 2-level factor variables, C
is a continuous numeric variable that needs smoothing, D is the
dependent variable, and E is the random effect (in my case, human
participants in an experiment).

To assess the A*C interaction:

1) turn B into a numeric centered on zero (-1,1):
my_data$B2 = as.numeric(my_data$B2)-1.5

2) fit a gamm, including the A*B2 interaction and smoothing C
independently per level of A:
AC_fit = gamm4(
    formula = D ~ A*B2 + s(C,by=A)
    , random = ~ (1|E)
    , data = my_data
)

3) Obtain predictions & 95% CI from the model for the 2 levels of A
across the range of C
AC_pred = expand.grid(
    A = factor(levels(my_data$A),levels=levels(my_data$A))
    , B2 = 0
    , C = seq(min(my_data$C),max(my_data$X),length.out=1e3)
)
from_predict = predict(AC_fit,newdata=AC_pred,se.fit=T)
AC_pred$D = from_predict$fit
AC_pred$se = from_predict$se.fit
AC_pred$lo_95ci = AC_pred$D - AC_pred$se*1.96
AC_pred$hi_95ci = AC_pred$D + AC_pred$se*1.96

4) You could plot the "pred" data now, plotting D as a function of C
and split by levels of A, but this will include the main effects of
both A and C, obscuring visualization of the interaction.

5) Compute the A*C interaction by computing the difference between the
levels of A at each value of C:
AC_effect = ddply(
    .data = AC_pred
    , .variables = .(C)
    , .fun = function(x){
        to_return = data.frame(
            A_effect = x$D[x$A==levels(x$A)[1]] - x$D[x$A==levels(x$A)[2]]
            , lo_95ci = min(
                x$hi_95ci[x$A==levels(x$A)[1]] - x$lo_95ci[x$A==levels(x$A)[2]]
                , x$lo_95ci[x$A==levels(x$A)[1]] -
x$hi_95ci[x$A==levels(x$A)[2]]
            )
            , hi_95ci = max(
                x$hi_95ci[x$A==levels(x$A)[1]] - x$lo_95ci[x$A==levels(x$A)[2]]
                , x$lo_95ci[x$A==levels(x$A)[1]] -
x$hi_95ci[x$A==levels(x$A)[2]]
            )
        )
        return(to_return)
    }
)

6) Now you can visualize the A*C interaction; any area where the CI
ribbon does not include zero may be considered a region where there is
a significant A*C interaction.
ggplot(data=AC_effect)+
layer(
    geom='hline'
    , yintercept = 0
    , linetype = 2
)+
layer(
    geom = 'ribbon'
    , mapping = aes(
        x = C
        , ymin = lo_95ci
        , ymax = hi_95ci
    )
    , alpha = .5
)+
layer(
    geom = 'line'
    , mapping = aes(
        x = C
        , y = A_effect
    )
)

Similar steps can be applied to the inspection of a B*C interaction.

The A*B*C interaction would be assessed by:

1) create dummy variable, F, representing the A*B combinations:
my_data$F = factor(paste(my_data$A,my_data$B))

2) fit a gam that smooths C in each combination of A*B:
ABC_fit = gamm4(
    formula = D ~ A*B + s(C,by=F)
    , random = ~ (1|E)
    , data = my_data
)

3) obtain the prediction & CI for D in each combination of A*B and
across the range of C:
ABC_pred = expand.grid(
    A = factor(levels(my_data$A),levels=levels(my_data$A))
    , B = factor(levels(my_data$B),levels=levels(my_data$B))
    , C = seq(min(my_data$C),max(my_data$X),length.out=1e3)
)
ABC_pred = factor(paste(ABC_pred$A,ABC_pred$B),levels=levels(my_data$F))
from_predict = predict(ABC_fit,newdata=ABC_pred,se.fit=T)
ABC_pred$D = from_predict$fit
ABC_pred$se = from_predict$se.fit
ABC_pred$lo_95ci = ABC_pred$D - ABC_pred$se*1.96
ABC_pred$hi_95ci = ABC_pred$D + ABC_pred$se*1.96

4) Similar to step 5 above, compute the difference between levels of A
at each value of C *and* each level of B.

5) Using the differences computed in step 4, now compute the
difference between levels of B at each value of C.

6) Now we have another single function and confidence ribbon that can
be plotted; any region where the ribbon excludes zero may be
considered a region with a significant A*B*C interaction.

Does anyone have any thoughts on the validity of this approach? Also,
this is clearly only applicable to the cases where A and B have only 2
levels each, so any thoughts on an alternative more generalizable
approach would be appreciated.

Mike

-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~