[Rd] For wishlist: sanity checks for subsets in lm, glm (PR#

Peter Dalgaard BSA p.dalgaard@biostat.ku.dk
12 Apr 2000 13:18:37 +0200

Martyn Plummer <plummer@iarc.fr> writes:

> On 12-Apr-00 Peter Dalgaard BSA wrote:
> > Might be a good idea. Mind you, Splus 3.4 does exatly the same thing,
> > and I'm a little worried that the uniqueness assumption might kill
> > some bootstrapping applications:
> > 
> >  glm(y ~ x, data=test.data, subset=sample(seq(along=y),replace=T))
> Splus 5.1 doesn't do this, because it preserves logical vectors in
> data frames (but then perhaps comparisons with S4 or SPlus 5.x are
> irrelevant?)

They're sometimes relevant because they show what has been considered
a bug in 3.x... The convert-to-factor conventions in 3.x (and R) are
quite a bit of a pain in my opinion, but I'm afraid we're stuck with
them at least for the near future (we couldn't make an API change as
pervasive as that without bumping the major version number). 

> I knew there would be a good reason not to implement this. My feeling
> is that high level modelling functions should protect the  user as
> much as possible. If you want more flexibility, you can always
> program around it.

Maybe, but... Getting people to program around S/R differences in
their add-on packages haven't always been unproblematic.

What would also catch your case would be to disallow factors as subset
variables, and I can't think of any situation where subsetting with a
factor would occur naturally. Does 5.1 allow subsetting with factors?

