[R-sig-ME] lmer-model - model ok? (control condition, random effects, log transformation)

Tue Apr 4 13:36:47 CEST 2017

Dear all,

I'd be very thankful for a short feedback on whether my lmer-model is 
good the way it is. Thank you very much in advance!

I'm fitting log-10 reaction times (RT), reading times of sentences (s) 
of participants. The sentences are constructed in pairs: 1 test s. - 1 
control s. (as similar as possible to the test s.). The TEST sentences 
are in 4 conditions, i.e. they belong to one of four KINDS of sentences.

Sentences are matched in length (number of letters) and frequency 
(frequency of words averaged across sentence). Matching across ALL 
material is attempted but not quite possible, so the matching is done as 
well as possible within each test and control sentence pair.

First, I treated the material as 5 conditions: 1 control + 4 test 
conditions -> "cond5" (with contrast.treatment, i.e. control = dummy).
"Cond2| subj" means: random intercepts for test/control condition per 
subject. (Random slopes don't improve fit and aren't that important to me.)

The converging model with the best fit (AIC/BIC) is:

m1 <- lmer(logRT ~ cond5 + rat3 + (cond2|subject) + 
(length:frequency|item_ID), data, REML=F)

But I'm going back and forth on whether "cond5" is actually ok - or 
whether I need to compare the sentences ONLY pairwise, like: each of the 
4 conditions (or "kinds") only in test vs. control.
So I fit another model with a nested fixed effect. "cond4/cond2" means: 
4 conditions ("kinds") within test or control condition (with contr.sum, 
so sum contrasts).

The fit (AIC/BIC) is only a bit worse (910 vs. 905 above):

m2 <- lmer(logRT ~ cond4/cond2 + rat3 + (cond2|subject) + 
(length:frequency|item_ID), data, REML=F)

The results of the two models are the same. The fit is almost the same. 
My question is: Is one model more legitimate than the other, based on 
what I describe about the matching?
And is there something objectionable about how I fit the random effects? 
(My model doesn't necessarily converge when I change them.)

And I'm not 100% sure about whether I have to use logarithmized reaction 
times (=response variable). The raw reaction times have huge variance 
and are right-skewed. The log distribution is more normal, but still not 
a normal distribution. The model fit seems much better for the logRT, 
but the results differ slightly for logRT vs. raw RT.

Again thank you very much and I'd really appreciate your feedback. I 
hope I provided sufficient and non-confusing information.

Best

Diana

-- 
Diana Michl, M.A.
PhD candidate
International Experimental
and Clinical Linguistics
Universität Potsdam
www.ling.uni-potsdam.de/staff/dmichl

	[[alternative HTML version deleted]]