[R-sig-ME] Characterizing correlation between binomial effects

Thu Sep 6 00:41:06 CEST 2012

I have data from multiple human participants who each completed two
tasks. In each task there were two conditions, and in each condition
there are multiple trials that yield a binomial (1/0) outcome. I would
like to evaluate whether there is a correlation between Ss' condition
effect in task 1 and their condition effect in task 2. Traditionally
in my field, this would be achieved by collapsing the raw
trial-by-trial data to proportions in each task then correlate the
resulting scores:

    task1_scores = ddply(
        .data = task1_trials
        , .variables = .(participant)
        , .fun = function(x){
            data.frame( value = mean( x$response[x$condition=='A'] ) -
mean( x$response[x$condition=='B'] ) )
        }
    )
    task2_scores = ddply(
        .data = task2_trials
        , .variables = .(participant)
        , .fun = function(x){
            data.frame( value = mean( x$response[x$condition=='A'] ) -
mean( x$response[x$condition=='B'] ) )
        }
    )
    cor(task1_scores$value,task2_scores$value)

However, this clearly ignores the binomial nature of the raw data,
with the consequence (assuming the monte carlo simulation that I ran
last night is correct) that type-I error rates will be inflated for a
NHST test on the final correlation (at least, when the raw proportions
are far from 50%, which is typically the case for data I encounter).

Does anyone have any input on how to address the "is there a
correlation between the condition effects in the tasks" question in a
generalized linear mixed effects modelling framework?

One idea I had was to combine the two task data frames to a single
data frame, code task as a variable, and fit a full glmm model:

    fit = glmer(
        family = binomial
        , data = both_tasks
        , formula = response ~ task*condition + (1+task*condition|participant)
    )

then obtain the model's predictions for each participant's data:

    r = as.matrix(ranef(fit)$participant)
    temp = expand.grid(task=c('1','2'),condition=c('A','B'),response=0)
    from_terms = terms(response~task*condition)
    mm = model.matrix(from_terms,temp)
    preds = expand.grid(ask=c('1','2'),condition=c('A','B'),participant=1:nrow(r),response=0)
    for(i in 1:nrow(r)){
        preds$response[preds$participant==i] = as.numeric(mm %*% r[i,])
    }

then use these predictions to compute each participant's condition
effect in each task and correlate the resulting scores:

    scores = ddply(
        .data = preds
        , .variables = .(participant,task)
        , .fun = function(x){
            data.frame( value = diff(x$response) )
        }
    )
    cor( scores$value[scores$task=='1'] , scores$value[scores$task=='2'] )

But it appears that this approach also suffers from a type-I error
inflation (again, according to a simulation I ran last night).

Any suggestions?

Mike