[Rd] patch to model matrix to improve error messages
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Wed Feb 19 17:18:39 CET 2025
I posted this to r-bugzilla already
<https://bugs.r-project.org/show_bug.cgi?id=18860>, but I would be
interested to see if anyone here has comments (fewer people track
r-bugzilla closely ...)
The issue: if a variable reference in a model.frame() formula is
missing from the data but matches a function found elsewhere in the
environment (e.g. `count()` in `dplyr`), the user gets the opaque error
message "object is not a matrix".
If the variable reference is missing from the data, but matches (e.g.
a list or list-like object (e.g., a data frame), then we get the more
useful error "invalid type (TYPE) for variable 'VARIABLE'"
The issue is that model frame tries to compute the number of rows of
the data before it tests the type of each column. If we switch the order
of operations so that the columns are tested first, we get a much more
useful error message.
This change seems harmless, but of course I wouldn't be surprised if
someone can come up with a reasonable scenario where it causes problems ...
More details/examples in the bug report linked above. The relevant
bits of source code are here
model.c row/column checking:
https://github.com/r-devel/r-svn/blob/63369bfed9330b9461ec1d8b90d7251c0118508f/src/library/stats/src/model.c#L138-L152
nrows() function:
https://github.com/r-devel/r-svn/blob/63369bfed9330b9461ec1d8b90d7251c0118508f/src/main/util.c#L81-L94
cheers
Ben Bolker
More information about the R-devel
mailing list