[Rd] model.weights and model.offset: request for adjustment
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Feb 3 12:14:06 CET 2022
>>>>> Ben Bolker
>>>>> on Tue, 1 Feb 2022 21:21:46 -0500 writes:
> The model.weights() and model.offset() functions from the 'stats'
> package index possibly-missing elements of a data frame via $, e.g.
> x$"(offset)"
> x$"(weights)"
> This returns NULL without comment when x is a data frame:
> x <- data.frame(a=1)
> x$"(offset)" ## NULL
> x$"(weights)" ## NULL
> However, when x is a tibble we get a warning as well:
> x <- tibble::as_tibble(x)
> x$"(offset)"
> ## NULL
> ## Warning message:
> ## Unknown or uninitialised column: `(offset)`.
> I know it's not R-core's responsibility to manage forward
> compatibility with tibbles, but in this case [[-indexing would seem to
> be better practice in any case.
Yes, I would agree: we should use [[ instead of $ here
in order to force exact matching just as principle
Importantly, because also mf[["(weights)"]]
will return NULL without a warning for a model/data frame, and
it seems it does so also for tibbles.
> Might a patch be accepted ... ?
That would not be necessary.
There's one remaining problem however:
`$` access is clearly faster than `[[` for small data frames
(because `$` is a primitive function doing everything in C,
whereas `[[` calls the R level data frame method ).
Faster in both cases, i.e., when there *is* a column and when there
is none (and NULL is returned), e.g., for the first case
> system.time(for(i in 1:20000) df[["a"]])
user system elapsed
0.064 0.000 0.065
> system.time(for(i in 1:20000) df$a)
user system elapsed
0.009 0.000 0.009
So that's probably been the reason why `$` has been prefered?
Martin
> cheers
> Ben Bolker
More information about the R-devel
mailing list