[Rd] Guidelines for S3 regression models

Stephen Milborrow milbo at sonic.net
Tue Jun 30 13:36:53 CEST 2015

Given how much documentation is available on R coding in general, it is
surprising how little is available specifically on writing model code.
Researchers who come up with a new method of regression, and who want to
write an S3 model for that method, must currently go all the way back to the 
Venables and Ripley S programming book.

> On 26.06.2015 14:09, Stephen Milborrow wrote:
> > Once we have built a regression model, we typically want to use the
> > model for further processing, such as making predictions from the model
> > or plotting the residuals.  Unfortunately, for many packages on CRAN
> > this can be difficult.
> >
> > For example, some models don't have a residuals method and don't save
> > the call or data --- so you can't tell how to generate the residuals
> > from the model object itself.
> >
> > A common snag is that for some models the new data for predict() has to
> > be a matrix; for others it has to be a data.frame.  This places an
> > unnecessary burden on the user when both data.frames and matrices can
> > easily be supported by predict.
> >
> > To mitigate such issues, I'm going out on a limb and presenting some
> > guidelines for writers of S3 regression model functions (this document
> > is currently part of the plotmo package):
> > http://www.milbo.org/doc/modguide.pdf
> On 26.06.2015 16:41, Achim Zeileis wrote:
> I think this is a nice and useful starting point. It's probably not
> comprehensive (yet) but will surely help.
> You could add something more about writing the formula interface and the
> correct processing of model.frame, terms, model.response, model.matrix,
> model.weights, model.offset. Especially for models with linear predictors
> the latter two can be very useful and are often not hard to implement. In
> case the model has multiple parts or multiple responses, the "Formula"
> package (and its vignette) might also be helpful.
> As for the S3 methods, I would omit coefficients, fitted.values, and resid
> from the list. These dispatch to coef, fitted, and residuals anyway. For
> inference it would also be very useful to add nobs(), df.residual(),
> vcov(), and logLik() and/or deviance() where applicable. An overview which
> lists some (but not all) useful methods is in Table 1 of
> vignette("betareg", package = "betareg").
> For coef() and vcov() it is useful/important that the names and dimension
> match. Then Wald tests can be easily computed in functions like
> car::linearHypothesis(), car::deltaMethod(), lmtest::waldtest(), or
> lmtest::coeftest().

Thanks for these, I'll update the document.

Stephen Milborrow

More information about the R-devel mailing list