[R-sig-ME] Using lme4 with very limited number of observations makes sense?
Jonathan Baron
baron at psych.upenn.edu
Sun May 3 19:50:23 CEST 2015
I can't resist replying to this, even though I won't have much to say
about lmer().
It seems to me that the big problem is the small number of
subjects. And the main issues are the differences between the two
groups. In order to find anything with such a small number, you would
need huge effects. Moreover, the patients will differ among themselves
as a function of the severity of their disease at the time, adding
additional variance. This means that you should have unequal variances
between the two groups. My hunch is that this is not a big problem
when you have large samples but could be a problem here even with
lmer(). [This is where I am not an expert.] Ordinary t-tests can
correct for unequal variance (and do so, in R, by default). You might
be able to remove this unequal variance if you include a measure of
disease severity (with the control group at 0). (And it might also
help to get more controls, especially if you can deal with the unequal
variance.)
My thought is that IF the effects of disease are large enough so that
you can get any kind of "significant" result with such a small sample,
THEN they would also be large enough to see them graphically. If I had
these data, the first thing I would do, and maybe the last, would be
to plot graphs of the results for each of the four sub-conditions
(defined by voicing and context). Each point would the mean for one
subject of one of the four conditions, and I would use different
colors for the two groups. And/or the horizontal axis could be some
measure of disease severity, with the control group at 0. If there
were a group difference, it would be significant by an eyeball
test. It would be so obvious that it hits you between the eyeballs. If
it isn't that obvious, then I suspect that nothing will help.
I don't much see the point of using lmer() here as opposed to
"ordinary least squares" regression or t-tests*, since you main
interest is the subjects. The one advantage of lmer() would (I think)
be to find effects of the manipulations (context and voicing), but
probably you already know that.
*You can test interactions with t-tests by computing differences for
each subject.
Jon
On 04/28/15 15:38, Massimiliano Mario Iraci wrote:
>Dear all,
>
>my name is Massimiliano Iraci, I am a PhD student at the University of
>Salento (Lecce, Italy) and University of Cologne (Germany). My PhD is on
>Phonetics and Phonology and I am working on the kinematics of speech in
>Parkinson's Disease.
>
>I am very fresh with statistics and recently I am even switching from SPSS
>to R, especially working with linear mixed models (lme4 package).
>Unfortunately I am having some doubts about the use of this model with my
>data.
>
>In my field, the data acquisition is much complicated because of the
>instruments, so eventually I always have few data for few subjects. So, in
>order to power-up my data, I am used to record more (5-7) repetitions of
>any item of interest.
>
>So, for instance, if I want to focus on the displacement of the lower lip
>during the production of a bilabial speech gesture, I consider the
>voiced/unvoiced condition, in 2 contexts (singleton/geminate). Thus I will
>have 7 repetitions of the same item x 2 conditions (voiced/unvoiced) x 2
>contexts (singleton/geminate) x 10 subjects (5 pathological + 5 controls).
>I fit the model as follows:
>lip_displacement ~ PATvsCTR * condition * context + (1|repetitions) +
>(1+condition|subject) + (1+context|subject)
>
>I must highlight that:
>- I don't have always 7 repetitions for any item: some subjects were able
>to produce 5, some 6, some 7 differing from item to item (so generally
>number of "same items" range from 5 to 7);
>- 'repetitions' is a variable reporting the cardinal number associated to
>the chronological order of the repetitions recorded (so ranging from 1 to
>7)
>
>This fit very often generates several errors and warnings ("large
>eigenvalue ratio"; "degenerate Hessian with 1 negative eigenvalues"; etc.)
>and if plotting the distribution of fitted/residuals I see stripes clearly
>because of the repetitions.
>
>Finally the questions are:
>- could the repetitions be a problem for the model? Could it better to
>work with an average of the repetitions in order to have only 1 value for
>each item?
>- if the previous is true, does it make sense to compare such a limited
>number of values in such a limited number of subjects with this model?
>
>I am sorry for my limited knowledge in statics. I would be really grateful
>if you could help me to shed light on the problem. Thank you very much in
>advance for your help.
>
>I look forward to hearing from you.
>Kind regards,
>
>Massimiliano
>
>
>
>===============================================================
>
>Massimiliano Mario Iraci
>PhD student
>
>CRIL (Intedisciplinary Center for Research on Language) &
>DReAM (Laboratory of Research Applied to Medicine)
>University of Salento & Local Health Service (ASL Lecce)
>c/o Vito Fazzi Hospital
>Piazza Filippo Muratore - 73100 - Lecce (Italy)
>
>web: http://www.cril.unisalento.it/en/staff_details.php?id=123
>tel: 0039 - 0832 335008
>
>_______________________________________________
>R-sig-mixed-models at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)
More information about the R-sig-mixed-models
mailing list