[R-sig-ME] gamm4 model formulation clarification
Mike Lawrence
Mike.Lawrence at dal.ca
Mon Aug 2 19:44:24 CEST 2010
On Sat, Jul 31, 2010 at 9:42 PM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
> Also, I'm just now thinking that neither approach really gets at what
> I want to test; what I want to test whether the model is improved by
> (1) letting the smooth vary as a function of design, (2) letting the
> smooth vary as a function of ddB, and (3) letting the smooth vary as a
> function of both design & ddB. That is, is there a design*soa
> interaction, a ddB*soa interaction and/or a design*ddB*soa
> interaction? In the case of support for any of these interactions, I'd
> also be interested in pinpointing the timeframe over which the
> interactions take place.
After playing a bit, I wonder if anyone has any input on the following
"solution" I'm currently working with. First, a change in variable
names to simplify things: A & B are both 2-level factor variables, C
is a continuous numeric variable that needs smoothing, D is the
dependent variable, and E is the random effect (in my case, human
participants in an experiment).
To assess the A*C interaction:
1) turn B into a numeric centered on zero (-1,1):
my_data$B2 = as.numeric(my_data$B2)-1.5
2) fit a gamm, including the A*B2 interaction and smoothing C
independently per level of A:
AC_fit = gamm4(
formula = D ~ A*B2 + s(C,by=A)
, random = ~ (1|E)
, data = my_data
)
3) Obtain predictions & 95% CI from the model for the 2 levels of A
across the range of C
AC_pred = expand.grid(
A = factor(levels(my_data$A),levels=levels(my_data$A))
, B2 = 0
, C = seq(min(my_data$C),max(my_data$X),length.out=1e3)
)
from_predict = predict(AC_fit,newdata=AC_pred,se.fit=T)
AC_pred$D = from_predict$fit
AC_pred$se = from_predict$se.fit
AC_pred$lo_95ci = AC_pred$D - AC_pred$se*1.96
AC_pred$hi_95ci = AC_pred$D + AC_pred$se*1.96
4) You could plot the "pred" data now, plotting D as a function of C
and split by levels of A, but this will include the main effects of
both A and C, obscuring visualization of the interaction.
5) Compute the A*C interaction by computing the difference between the
levels of A at each value of C:
AC_effect = ddply(
.data = AC_pred
, .variables = .(C)
, .fun = function(x){
to_return = data.frame(
A_effect = x$D[x$A==levels(x$A)[1]] - x$D[x$A==levels(x$A)[2]]
, lo_95ci = min(
x$hi_95ci[x$A==levels(x$A)[1]] - x$lo_95ci[x$A==levels(x$A)[2]]
, x$lo_95ci[x$A==levels(x$A)[1]] -
x$hi_95ci[x$A==levels(x$A)[2]]
)
, hi_95ci = max(
x$hi_95ci[x$A==levels(x$A)[1]] - x$lo_95ci[x$A==levels(x$A)[2]]
, x$lo_95ci[x$A==levels(x$A)[1]] -
x$hi_95ci[x$A==levels(x$A)[2]]
)
)
return(to_return)
}
)
6) Now you can visualize the A*C interaction; any area where the CI
ribbon does not include zero may be considered a region where there is
a significant A*C interaction.
ggplot(data=AC_effect)+
layer(
geom='hline'
, yintercept = 0
, linetype = 2
)+
layer(
geom = 'ribbon'
, mapping = aes(
x = C
, ymin = lo_95ci
, ymax = hi_95ci
)
, alpha = .5
)+
layer(
geom = 'line'
, mapping = aes(
x = C
, y = A_effect
)
)
Similar steps can be applied to the inspection of a B*C interaction.
The A*B*C interaction would be assessed by:
1) create dummy variable, F, representing the A*B combinations:
my_data$F = factor(paste(my_data$A,my_data$B))
2) fit a gam that smooths C in each combination of A*B:
ABC_fit = gamm4(
formula = D ~ A*B + s(C,by=F)
, random = ~ (1|E)
, data = my_data
)
3) obtain the prediction & CI for D in each combination of A*B and
across the range of C:
ABC_pred = expand.grid(
A = factor(levels(my_data$A),levels=levels(my_data$A))
, B = factor(levels(my_data$B),levels=levels(my_data$B))
, C = seq(min(my_data$C),max(my_data$X),length.out=1e3)
)
ABC_pred = factor(paste(ABC_pred$A,ABC_pred$B),levels=levels(my_data$F))
from_predict = predict(ABC_fit,newdata=ABC_pred,se.fit=T)
ABC_pred$D = from_predict$fit
ABC_pred$se = from_predict$se.fit
ABC_pred$lo_95ci = ABC_pred$D - ABC_pred$se*1.96
ABC_pred$hi_95ci = ABC_pred$D + ABC_pred$se*1.96
4) Similar to step 5 above, compute the difference between levels of A
at each value of C *and* each level of B.
5) Using the differences computed in step 4, now compute the
difference between levels of B at each value of C.
6) Now we have another single function and confidence ribbon that can
be plotted; any region where the ribbon excludes zero may be
considered a region with a significant A*B*C interaction.
Does anyone have any thoughts on the validity of this approach? Also,
this is clearly only applicable to the cases where A and B have only 2
levels each, so any thoughts on an alternative more generalizable
approach would be appreciated.
Mike
--
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University
Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar
~ Certainty is folly... I think. ~
More information about the R-sig-mixed-models
mailing list