[R-sig-ME] Characterizing correlation between binomial effects
Mike Lawrence
mike.lwrnc at gmail.com
Thu Sep 6 00:41:06 CEST 2012
I have data from multiple human participants who each completed two
tasks. In each task there were two conditions, and in each condition
there are multiple trials that yield a binomial (1/0) outcome. I would
like to evaluate whether there is a correlation between Ss' condition
effect in task 1 and their condition effect in task 2. Traditionally
in my field, this would be achieved by collapsing the raw
trial-by-trial data to proportions in each task then correlate the
resulting scores:
task1_scores = ddply(
.data = task1_trials
, .variables = .(participant)
, .fun = function(x){
data.frame( value = mean( x$response[x$condition=='A'] ) -
mean( x$response[x$condition=='B'] ) )
}
)
task2_scores = ddply(
.data = task2_trials
, .variables = .(participant)
, .fun = function(x){
data.frame( value = mean( x$response[x$condition=='A'] ) -
mean( x$response[x$condition=='B'] ) )
}
)
cor(task1_scores$value,task2_scores$value)
However, this clearly ignores the binomial nature of the raw data,
with the consequence (assuming the monte carlo simulation that I ran
last night is correct) that type-I error rates will be inflated for a
NHST test on the final correlation (at least, when the raw proportions
are far from 50%, which is typically the case for data I encounter).
Does anyone have any input on how to address the "is there a
correlation between the condition effects in the tasks" question in a
generalized linear mixed effects modelling framework?
One idea I had was to combine the two task data frames to a single
data frame, code task as a variable, and fit a full glmm model:
fit = glmer(
family = binomial
, data = both_tasks
, formula = response ~ task*condition + (1+task*condition|participant)
)
then obtain the model's predictions for each participant's data:
r = as.matrix(ranef(fit)$participant)
temp = expand.grid(task=c('1','2'),condition=c('A','B'),response=0)
from_terms = terms(response~task*condition)
mm = model.matrix(from_terms,temp)
preds = expand.grid(ask=c('1','2'),condition=c('A','B'),participant=1:nrow(r),response=0)
for(i in 1:nrow(r)){
preds$response[preds$participant==i] = as.numeric(mm %*% r[i,])
}
then use these predictions to compute each participant's condition
effect in each task and correlate the resulting scores:
scores = ddply(
.data = preds
, .variables = .(participant,task)
, .fun = function(x){
data.frame( value = diff(x$response) )
}
)
cor( scores$value[scores$task=='1'] , scores$value[scores$task=='2'] )
But it appears that this approach also suffers from a type-I error
inflation (again, according to a simulation I ran last night).
Any suggestions?
Mike
More information about the R-sig-mixed-models
mailing list