[R-sig-ME] gamm4 model formulation clarification

Tue Aug 3 14:45:04 CEST 2010

Oops. It seems that my logic in computing the CI for the difference
scores in the "solution" I posted was seriously flawed (see:
http://stats.stackexchange.com/questions/1169/ci-for-a-difference-based-on-independent-cis).
So, predict.gam(..., se.fit=T) produces predicted values and and SEs
for those values; any suggestions on how to compute the proper CI for
a difference between such predicted values given the SEs?

On Mon, Aug 2, 2010 at 2:44 PM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
> On Sat, Jul 31, 2010 at 9:42 PM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
>> Also, I'm just now thinking that neither approach really gets at what
>> I want to test; what I want to test whether the model is improved by
>> (1) letting the smooth vary as a function of design, (2) letting the
>> smooth vary as a function of ddB, and (3) letting the smooth vary as a
>> function of both design & ddB. That is, is there a design*soa
>> interaction, a ddB*soa interaction and/or a design*ddB*soa
>> interaction? In the case of support for any of these interactions, I'd
>> also be interested in pinpointing the timeframe over which the
>> interactions take place.
>
> After playing a bit, I wonder if anyone has any input on the following
> "solution" I'm currently working with. First, a change in variable
> names to simplify things: A & B are both 2-level factor variables, C
> is a continuous numeric variable that needs smoothing, D is the
> dependent variable, and E is the random effect (in my case, human
> participants in an experiment).
>
> To assess the A*C interaction:
>
> 1) turn B into a numeric centered on zero (-1,1):
> my_data$B2 = as.numeric(my_data$B2)-1.5
>
> 2) fit a gamm, including the A*B2 interaction and smoothing C
> independently per level of A:
> AC_fit = gamm4(
>    formula = D ~ A*B2 + s(C,by=A)
>    , random = ~ (1|E)
>    , data = my_data
> )
>
> 3) Obtain predictions & 95% CI from the model for the 2 levels of A
> across the range of C
> AC_pred = expand.grid(
>    A = factor(levels(my_data$A),levels=levels(my_data$A))
>    , B2 = 0
>    , C = seq(min(my_data$C),max(my_data$X),length.out=1e3)
> )
> from_predict = predict(AC_fit,newdata=AC_pred,se.fit=T)
> AC_pred$D = from_predict$fit
> AC_pred$se = from_predict$se.fit
> AC_pred$lo_95ci = AC_pred$D - AC_pred$se*1.96
> AC_pred$hi_95ci = AC_pred$D + AC_pred$se*1.96
>
> 4) You could plot the "pred" data now, plotting D as a function of C
> and split by levels of A, but this will include the main effects of
> both A and C, obscuring visualization of the interaction.
>
> 5) Compute the A*C interaction by computing the difference between the
> levels of A at each value of C:
> AC_effect = ddply(
>    .data = AC_pred
>    , .variables = .(C)
>    , .fun = function(x){
>        to_return = data.frame(
>            A_effect = x$D[x$A==levels(x$A)[1]] - x$D[x$A==levels(x$A)[2]]
>            , lo_95ci = min(
>                x$hi_95ci[x$A==levels(x$A)[1]] - x$lo_95ci[x$A==levels(x$A)[2]]
>                , x$lo_95ci[x$A==levels(x$A)[1]] -
> x$hi_95ci[x$A==levels(x$A)[2]]
>            )
>            , hi_95ci = max(
>                x$hi_95ci[x$A==levels(x$A)[1]] - x$lo_95ci[x$A==levels(x$A)[2]]
>                , x$lo_95ci[x$A==levels(x$A)[1]] -
> x$hi_95ci[x$A==levels(x$A)[2]]
>            )
>        )
>        return(to_return)
>    }
> )
>
> 6) Now you can visualize the A*C interaction; any area where the CI
> ribbon does not include zero may be considered a region where there is
> a significant A*C interaction.
> ggplot(data=AC_effect)+
> layer(
>    geom='hline'
>    , yintercept = 0
>    , linetype = 2
> )+
> layer(
>    geom = 'ribbon'
>    , mapping = aes(
>        x = C
>        , ymin = lo_95ci
>        , ymax = hi_95ci
>    )
>    , alpha = .5
> )+
> layer(
>    geom = 'line'
>    , mapping = aes(
>        x = C
>        , y = A_effect
>    )
> )
>
>
> Similar steps can be applied to the inspection of a B*C interaction.
>
> The A*B*C interaction would be assessed by:
>
> 1) create dummy variable, F, representing the A*B combinations:
> my_data$F = factor(paste(my_data$A,my_data$B))
>
> 2) fit a gam that smooths C in each combination of A*B:
> ABC_fit = gamm4(
>    formula = D ~ A*B + s(C,by=F)
>    , random = ~ (1|E)
>    , data = my_data
> )
>
> 3) obtain the prediction & CI for D in each combination of A*B and
> across the range of C:
> ABC_pred = expand.grid(
>    A = factor(levels(my_data$A),levels=levels(my_data$A))
>    , B = factor(levels(my_data$B),levels=levels(my_data$B))
>    , C = seq(min(my_data$C),max(my_data$X),length.out=1e3)
> )
> ABC_pred = factor(paste(ABC_pred$A,ABC_pred$B),levels=levels(my_data$F))
> from_predict = predict(ABC_fit,newdata=ABC_pred,se.fit=T)
> ABC_pred$D = from_predict$fit
> ABC_pred$se = from_predict$se.fit
> ABC_pred$lo_95ci = ABC_pred$D - ABC_pred$se*1.96
> ABC_pred$hi_95ci = ABC_pred$D + ABC_pred$se*1.96
>
> 4) Similar to step 5 above, compute the difference between levels of A
> at each value of C *and* each level of B.
>
> 5) Using the differences computed in step 4, now compute the
> difference between levels of B at each value of C.
>
> 6) Now we have another single function and confidence ribbon that can
> be plotted; any region where the ribbon excludes zero may be
> considered a region with a significant A*B*C interaction.
>
>
> Does anyone have any thoughts on the validity of this approach? Also,
> this is clearly only applicable to the cases where A and B have only 2
> levels each, so any thoughts on an alternative more generalizable
> approach would be appreciated.
>
> Mike
>
> --
> Mike Lawrence
> Graduate Student
> Department of Psychology
> Dalhousie University
>
> Looking to arrange a meeting? Check my public calendar:
> http://tr.im/mikes_public_calendar
>
> ~ Certainty is folly... I think. ~
>

-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~