[R-sig-ME] Characterizing correlation between binomial effects
Mike Lawrence
mike.lwrnc at gmail.com
Thu Sep 6 17:09:42 CEST 2012
Reinhold Kliegl provided me with the solution, which I've coded into
an example available here:
https://gist.github.com/3657148
On Wed, Sep 5, 2012 at 7:41 PM, Mike Lawrence <mike.lwrnc at gmail.com> wrote:
> I have data from multiple human participants who each completed two
> tasks. In each task there were two conditions, and in each condition
> there are multiple trials that yield a binomial (1/0) outcome. I would
> like to evaluate whether there is a correlation between Ss' condition
> effect in task 1 and their condition effect in task 2. Traditionally
> in my field, this would be achieved by collapsing the raw
> trial-by-trial data to proportions in each task then correlate the
> resulting scores:
>
> task1_scores = ddply(
> .data = task1_trials
> , .variables = .(participant)
> , .fun = function(x){
> data.frame( value = mean( x$response[x$condition=='A'] ) -
> mean( x$response[x$condition=='B'] ) )
> }
> )
> task2_scores = ddply(
> .data = task2_trials
> , .variables = .(participant)
> , .fun = function(x){
> data.frame( value = mean( x$response[x$condition=='A'] ) -
> mean( x$response[x$condition=='B'] ) )
> }
> )
> cor(task1_scores$value,task2_scores$value)
>
> However, this clearly ignores the binomial nature of the raw data,
> with the consequence (assuming the monte carlo simulation that I ran
> last night is correct) that type-I error rates will be inflated for a
> NHST test on the final correlation (at least, when the raw proportions
> are far from 50%, which is typically the case for data I encounter).
>
> Does anyone have any input on how to address the "is there a
> correlation between the condition effects in the tasks" question in a
> generalized linear mixed effects modelling framework?
>
> One idea I had was to combine the two task data frames to a single
> data frame, code task as a variable, and fit a full glmm model:
>
> fit = glmer(
> family = binomial
> , data = both_tasks
> , formula = response ~ task*condition + (1+task*condition|participant)
> )
>
> then obtain the model's predictions for each participant's data:
>
> r = as.matrix(ranef(fit)$participant)
> temp = expand.grid(task=c('1','2'),condition=c('A','B'),response=0)
> from_terms = terms(response~task*condition)
> mm = model.matrix(from_terms,temp)
> preds = expand.grid(ask=c('1','2'),condition=c('A','B'),participant=1:nrow(r),response=0)
> for(i in 1:nrow(r)){
> preds$response[preds$participant==i] = as.numeric(mm %*% r[i,])
> }
>
> then use these predictions to compute each participant's condition
> effect in each task and correlate the resulting scores:
>
> scores = ddply(
> .data = preds
> , .variables = .(participant,task)
> , .fun = function(x){
> data.frame( value = diff(x$response) )
> }
> )
> cor( scores$value[scores$task=='1'] , scores$value[scores$task=='2'] )
>
> But it appears that this approach also suffers from a type-I error
> inflation (again, according to a simulation I ran last night).
>
> Any suggestions?
>
> Mike
More information about the R-sig-mixed-models
mailing list