[R] Linear modelling confusion.
Rolf Turner
r.turner at auckland.ac.nz
Thu Aug 30 02:44:43 CEST 2007
I'm puzzled by a phenomenon that I have just encountered, and was
hoping that someone
out there might be able to give me some insight. It's not an R
problem as such, but .....
well, there are R aspects.
I have a rather messy data set, with a response, a fixed effect, and
3 random effects:
school, class, and student. The classes are nested within schools;
the students are
nested within schools; students are *not* nested within classes. The
fixed effect
is ``time'', with 6 levels. There are 1428 observations.
The ``design'' (the data are from an observational study) is vastly
imbalanced; there
are brazillions of empty cells.
I tried fitting two models:
(1) y ~ time + school + class%in%school + student%in%school
(2) y ~ time + cls.in.scl + std.in.scl
where I formed the factors ``cls.in.scl'' and ``std.in.scl'' by using
the interaction()
function:
cls.in.scl <- interaction(class,school,sep=".in.")
std.in.scl <- interaction(student,school,sep=".in.")
This was one way that I could think of to fit the model with the
school factor dropped out
(given that the levels of ``class'' and ``student'' were nested
within the levels of ``school'').
Now my puzzlement: The model matrix for (2) is singular; that for
(1) is not.
I can believe that the imbalance/existence of empty cells could
easily make the model
singular --- but since students and classes are nested within
schools, I don't see how
including a school factor could eliminate the singularity. I have
been trying to concoct
a low dimensional toy example of this phenomenon, and am not able to
do so.
Can anyone provide me with some enlightenment?
I might remark that I have tried fitting the same two models, to the
same data, in
Minitab, and got the same sort of result --- model (1) ran to
completion; model (2) came
to a shuddering halt with a complaint about ``rank deficiency ... no
further analysis will
be done''.
I mention this because it convinces me that the problem is not with
my syntax for expressing
the models --- I have never really managed to grok R's linear
modelling syntax, but I feel
comfortable/confident with Minitab's syntax. Minitab would write
model (1) as
(1) y ~ time + school + class(school) + student(school)
While I'm on this topic --- a totally side-issue: I fitted the two
models with lm(), i.e. completely
ignoring the fact that school, class, and student are random effects
(whence any correct
inference from the fits is impossible). That's by-the-by, since my
real concern at the moment
is the singularity issue.
But can any kind soul please tell me how I might *properly* fit the
model in R? What package
and function should I use, and what (very explicitly, in
monosyllables, for I am a bear of very
little brain and long words bother me) is the correct syntax for
expressing the models for the
aforesaid function?
Yes, I have looked at lme4 and lmer(), and I have (tried to) read
Pinheiro and Bates, but I
couldn't get it. As I said, I am a bear of very little brain ....
Thanks for taking the time to read this long-winded cri de coeur. If
anyone is actually
*interested* I would be very happy to make the data available to her
or him.
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}
More information about the R-help
mailing list