[R] Repeated measures logistic regression

Sun Feb 25 20:58:09 CET 2007

Dear all,

I'm struggling to find the best (set of?) function(s) to do repeated  
measures logistic regression on some data from a psychology experiment.

An artificial version of the data I've got is as follows.  Firstly,  
each participant filled in a questionnaire, the result of which is a  
score.

 > questionnaire
    ID Score
1   1     6
2   2     5
3   3     6
4   4     2
...

Secondly, each participant did a task which required a series of  
button-pushes.  The response is binary.  The factors CondA and CondB  
describe the structure of the stimulus:

 > experiment
     ID CondA CondB Response
1    1    a1    b1        1
2    1    a2    b2        0
3    1    a3    b1        0
4    1    a4    b2        0
5    1    a1    b1        1
6    1    a2    b2        0
7    1    a3    b1        0
8    1    a4    b2        0
9    2    a1    b1        1
10   2    a2    b2        0
11   2    a3    b1        0
12   2    a4    b2        0
13   2    a1    b1        1
14   2    a2    b2        0
15   2    a3    b1        0
16   2    a4    b2        0

I would like to model how someone's score on the questionnaire  
relates to the responses they give in the button-pushing.  I'm  
particularly interested in interactions between the type of the  
stimulus and the score.

I combined the experiment and the questionnaire dataframe with a  
merge so now there an additional column.

 > exp.q
     ID Score CondA CondB Response
1    1     6    a1    b1        1
2    1     6    a2    b2        0
3    1     6    a3    b1        0
4    1     6    a4    b2        0
5    1     6    a1    b1        1
6    1     6    a2    b2        0
7    1     6    a3    b1        0
8    1     6    a4    b2        0
9    2     5    a1    b1        1
10   2     5    a2    b2        0
11   2     5    a3    b1        0
12   2     5    a4    b2        0
...

Eventually, via glm, glmmPQL, and a few others, I ended up with  
lmer.  My questions follow.  I suspect (or hope) that I need to be  
pointed towards the relevant literature.  I own Faraway's "Extending  
the Linear Model with R" and Crawley's "Statistics: An Introduction  
using R".

1. Is the way I've combined the tables okay?  I'm concerned that the  
repetition of the score is Bad but can't think of any other way to  
code things.

2. Is lmer the most appropriate function to use?

3. If so, does the following call capture what I'm trying to model?

model1 = lmer(Response ~ CondA * CondB * Score + (1|Subject),
               data =exp.q,
               family = binomial)

I just want to tell lmer, "Look, this set of responses all comes from  
the same person: tell me the within-subject stuff that's going on and  
how that's affected by their score!"

4. Is there any way to do stepwise model simplification?  In the real  
data I have, there are several more predictors, including more than  
one questionnaire score and subscores.  I have specific hypotheses  
about what could be going on, so I can live with manual editing of  
the formulae, but it's nice for exploratory purposes to do stepwise  
simplification.

5. What's the best way to discover and report the relative  
contribution of each predictor?  I'm after an analogue of  
standardized betas (though I recently learned that they're thoroughly  
evil).

6. Is there anyway to get a p-value for goodness of fit?

Many thanks for any help,

Andy

--
Andy Fugard, Postgraduate Research Student
Psychology (Room F15), The University of Edinburgh,
   7 George Square, Edinburgh EH8 9JZ, UK
Mobile: +44 (0)78 123 87190   http://www.possibly.me.uk