[Rd] model.weights and model.offset: request for adjustment
tim@t@yior m@iii@g oii hidde@eieph@@ts@co@uk
tim@t@yior m@iii@g oii hidde@eieph@@ts@co@uk
Thu Feb 3 12:30:17 CET 2022
> On 03/02/2022 11:14 Martin Maechler <maechler using stat.math.ethz.ch> wrote:
>
>
> >>>>> Ben Bolker
> >>>>> on Tue, 1 Feb 2022 21:21:46 -0500 writes:
>
> > The model.weights() and model.offset() functions from the 'stats'
> > package index possibly-missing elements of a data frame via $, e.g.
>
> > x$"(offset)"
> > x$"(weights)"
>
> > This returns NULL without comment when x is a data frame:
>
> > x <- data.frame(a=1)
> > x$"(offset)" ## NULL
> > x$"(weights)" ## NULL
>
> > However, when x is a tibble we get a warning as well:
>
> > x <- tibble::as_tibble(x)
> > x$"(offset)"
> > ## NULL
> > ## Warning message:
> > ## Unknown or uninitialised column: `(offset)`.
>
> > I know it's not R-core's responsibility to manage forward
> > compatibility with tibbles, but in this case [[-indexing would seem to
> > be better practice in any case.
>
> Yes, I would agree: we should use [[ instead of $ here
> in order to force exact matching just as principle
>
> Importantly, because also mf[["(weights)"]]
> will return NULL without a warning for a model/data frame, and
> it seems it does so also for tibbles.
>
> > Might a patch be accepted ... ?
>
> That would not be necessary.
>
> There's one remaining problem however:
> `$` access is clearly faster than `[[` for small data frames
> (because `$` is a primitive function doing everything in C,
> whereas `[[` calls the R level data frame method ).
>
> Faster in both cases, i.e., when there *is* a column and when there
> is none (and NULL is returned), e.g., for the first case
>
> > system.time(for(i in 1:20000) df[["a"]])
> user system elapsed
> 0.064 0.000 0.065
> > system.time(for(i in 1:20000) df$a)
> user system elapsed
> 0.009 0.000 0.009
>
> So that's probably been the reason why `$` has been prefered?
Would .subset2(df, "a) be preferable?
R> df <- mtcars
R> system.time(for(i in 1:20000) df[["hp"]])
user system elapsed
0.078 0.000 0.078
R> system.time(for(i in 1:20000) df$hp)
user system elapsed
0.011 0.000 0.010
R> system.time(for(i in 1:20000) .subset2(df,"hp"))
user system elapsed
0.004 0.000 0.004
Tim
More information about the R-devel
mailing list