[Rd] Deprecating partial matching in $.data.frame

Wed Mar 20 17:54:49 CET 2013

Le mercredi 20 mars 2013 à 17:16 +0100, peter dalgaard a écrit :
> On Mar 20, 2013, at 16:23 , Hadley Wickham wrote:
> 
> > On Wed, Mar 20, 2013 at 7:28 AM, peter dalgaard <pdalgd at gmail.com>
> wrote:
> >> Allowing partial matching on $-extraction has always been a source
> of accidents. Recently, someone who shall remain nameless tried
> names(mydata) <- "d^2" followed by mydata$d^2.
> >> 
> >> As variables in a data frame are generally considered similar to
> variables in, say, the global environment, it seems strange that foo
> $bar can give you the content of foo$bartender.
> >> 
> >> In R-devel (i.e., *not* R-3.0.0 beta, but 3.1.0-to-be) partial
> matches now gives a warning.
> > 
> > Just for data frames, or also for lists?
> 
> Just for data frames, at least for now. For lists, there are just too
> many uses of chisq.test()$exp etc. (I nearly wrote t.test()$p, but
> that doesn't actually work!)
I also think this is a very good idea, but special-casing data frames is
going to create some confusion in that already complex area. Wouldn't it
make more sense to aim at fixing both lists and data frames in the same
R release?

In a first phase, R CMD check could report errors when partial matching
is detected, but normal R use would not warn: this would leave some time
for package maintainers to fix their code (I guess R CMD check could
enable the warnings as an option while running package tests if
detecting them from static code parsing is not possible). Then, in a
second phase, warnings would be enabled by default for lists and data
frames.

My two cents

> > 
> > I think this is a fantastic change, but I do worry a little that it is
> > going to generate warnings for a _lot_ of existing code.
> 
> We'll see about that, but I expect it not to be all that bad. In
> general purpose code, you need to have a situation where the data
> frame has known column names, and the one that you want is
> sufficiently awkward to type.  The p-value column in anova is about
> the only realistic scenario that I can come up with. The ones in,
> e.g., summary.lm are in a matrix, not a data frame.  
> 
> > 
> > Hadley
> > 
> > -- 
> > Chief Scientist, RStudio
> > http://had.co.nz/
>