# [R] Linear modelling confusion.

Rolf Turner r.turner at auckland.ac.nz
Thu Aug 30 02:44:43 CEST 2007

```I'm puzzled by a phenomenon that I have just encountered, and was
hoping that someone
out there might be able to give me some insight.  It's not an R
problem as such, but .....
well, there are R aspects.

I have a rather messy data set, with a response, a fixed effect, and
3 random effects:
school, class, and student.  The classes are nested within schools;
the students are
nested within schools; students are *not* nested within classes.  The
fixed effect
is ``time'', with 6 levels.  There are 1428 observations.

The ``design'' (the data are from an observational study) is vastly
imbalanced; there
are brazillions of empty cells.

I tried fitting two models:

(1) y ~ time + school + class%in%school + student%in%school

(2) y ~ time + cls.in.scl + std.in.scl

where I formed the factors ``cls.in.scl'' and ``std.in.scl'' by using
the interaction()
function:

cls.in.scl <- interaction(class,school,sep=".in.")
std.in.scl <- interaction(student,school,sep=".in.")

This was one way that I could think of to fit the model with the
school factor dropped out
(given that the levels of  ``class'' and ``student'' were nested
within the levels of ``school'').

Now my puzzlement:  The model matrix for (2) is singular; that for
(1) is not.

I can believe that the imbalance/existence of empty cells could
easily make the model
singular --- but since students and classes are nested within
schools, I don't see how
including a school factor could eliminate the singularity.  I have
been trying to concoct
a low dimensional toy example of this phenomenon, and am not able to
do so.

Can anyone provide me with some enlightenment?

I might remark that I have tried fitting the same two models, to the
same data, in
Minitab, and got the same sort of result --- model (1) ran to
completion; model (2) came
to a shuddering halt with a complaint about ``rank deficiency ... no
further analysis will
be done''.

I mention this because it convinces me that the problem is not with
my syntax for expressing
the models --- I have never really managed to grok R's linear
modelling syntax, but I feel
comfortable/confident with Minitab's syntax.  Minitab would write
model (1) as

(1) y ~ time + school + class(school) + student(school)

While I'm on this topic --- a totally side-issue:  I fitted the two
models with lm(), i.e. completely
ignoring the fact that school, class, and student are random effects
(whence any correct
inference from the fits is impossible).  That's by-the-by, since my
real concern at the moment
is the singularity issue.

But can any kind soul please tell me how I might *properly* fit the
model in R?  What package
and function should I use, and what (very explicitly, in
monosyllables, for I am a bear of very
little brain and long words bother me) is the correct syntax for
expressing the models for the
aforesaid function?

Yes, I have looked at lme4 and lmer(), and I have (tried to) read
Pinheiro and Bates, but I
couldn't get it.  As I said, I am a bear of very little brain ....

Thanks for taking the time to read this long-winded cri de coeur.  If
anyone is actually
*interested* I would be very happy to make the data available to her
or him.

cheers,

Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}

```

More information about the R-help mailing list