[R-sig-ME] conceptualizing items + subject analysis

Thu Jun 13 15:00:51 CEST 2013

On 06/12/13 13:37, Bob Wiley wrote:
> Hello,
> 
> Let me start by briefly explaining my dataset and the question I'm trying
> to answer:
> Subjects saw pairs of letters and responded either "same" or "different"
> (e.g. they saw "L L" and responded "same" or "L R" and responded
> "different") and I have measures for both accuracy and response time.
> 
> .I want to predict their reaction time on correct "different" responses
> based off of several factors:
> pixels - the pixel overlap of the two letters
> alphabet - their proximity in the alphabet
> identity - whether or not they have the same name, like "r R" or "g G"
> 
> Each pair has been coded with a value on all of these measures, for example:
> "r R" gets a 0.13 on pixels, a 0 on alphabet, and a 1 on identity,
> while
> "E F" gets 0.64 on pixels, a 1 on alphabet, and a 0 on identity
> 
> So of course I have this large data set of 24*990 observations (less
> incorrect responses). But I've gotten confused now about how best to
> predict the reaction time on these factors, accounting for the random
> effects... I was thinking the R model would be:
> 
> fit = lmer ( rt ~ pixels + alphabet + identity + (1|Subject) + (1|Pair))
> 
> The model does run, and in fact all factors come out significant. But I
> have two concerns:
> 1. That entering the data/defining the model this way is not correct.

Why not?

The only problem I can see is a substantive one, and I'm not sure
about it. Because you are looking at "different" responses only, Ee
would be "yes" but EE or ee would be "no". Thus, you have
same/different confounded to some extent with
same-case/different-case. Similarly, same-case/different-case might be
confounded with pixels. Subjects might carry out (simultaneous)
comparisons of same-different visually (V), same-different in identity
(I), and same-different in case (C). The results of C would determine
the relevance of I or V. Thus, you might want to look at what happens
if you include a term for C, and terms for its interaction with V and
C. (Not sure this is exactly right, but something like this.)

Also, I'm not sure how to handle possible individual
differences. Subjects may differ in the relative speeds of V, I, and
C, and thus in the efficiency of different strategies. If you thought
that subjects might reasonably differ in the direction of various
effects, then you might want to include random slopes. But if your
hypotheses are correctly one-tailed (and I think they are), then I
don't see that you need to do this. With data like these, you could
also test individual subjects. (See
http://www.sas.upenn.edu/~baron/papers/sinica.pdf.)

And you might want to include a term for the order of a given trial in
the sequence of trials. Subjects get faster throughout an
experiment. So including such a term can reduce the variance otherwise
attributed to "error" and make the other comparisons more
sensitive. In my experience, if you use the log of RT as the dependent
variable (usually a good idea anyway, unless you are testing additive
factors), then the effect of order is close to linear.

> 2. That I need  to determine some goodness of fit for the model and
> contribution of these factors, the MCMC p-value not being sufficient to
> defend the inclusion of all of these variables

Here I'm afraid I can't help because I don't understand the
problem. There are lots of outputs aside from MCMC p-values. And I
don't understand why a low p-value is not sufficient for inclusion.

> Any advice or pointing out of my mistakes will be much appreciated. Thank
> you!
> 
> Bob, JHU
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)