[R-pkg-devel] Absent variables and tibble

Thierry Onkelinx thierry.onkelinx at inbo.be
Mon Jun 27 17:40:02 CEST 2016


Dear Russell.

The assertthat package (by Hadley) provides a has_name() function.

> library(assertthat)
> x <- data.frame(y = NA)
> has_name(x, "y")
[1] TRUE
> has_name(x, "x")
[1] FALSE

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-06-27 17:05 GMT+02:00 Lenth, Russell V <russell-lenth at uiowa.edu>:

> Thanks, Hadley. I do understand why you'd want more careful checking.
>
> If you're going to provide a variable-existing function, may I suggest a
> short name like 'has'? I.e., has(x, var) returns TRUE if x has var in it.
>
> Thanks
>
> Russ
>
> > On Jun 27, 2016, at 9:47 AM, Hadley Wickham <h.wickham at gmail.com> wrote:
> >
> > On Mon, Jun 27, 2016 at 9:03 AM, Duncan Murdoch
> > <murdoch.duncan at gmail.com> wrote:
> >> On 27/06/2016 9:22 AM, Lenth, Russell V wrote:
> >>>
> >>> My package 'lsmeans' is now suddenly broken because of a new provision
> in
> >>> the 'tibble' package (loaded by 'dplyr' 0.5.0), whereby the "[[" and
> "$"
> >>> methods for 'tbl_df' objects - as documented - throw an error if a
> variable
> >>> is not found.
> >>>
> >>> The problem is that my code uses tests like this:
> >>>
> >>>        if (is.null (x$var)) {...}
> >>>
> >>> to see whether 'x' has a variable 'var'. Obviously, I can work around
> this
> >>> using
> >>>
> >>>        if (!("var" %in% names(x))) {...}
> >>>
> >>> but (a) I like the first version better, in terms of the code being
> >>> understandable; and (b) isn't there a long history whereby we can
> expect a
> >>> NULL result when accessing an absent member of a list (and hence a
> >>> data.frame)? (c) the code base for 'lsmeans' has about 50 instances of
> such
> >>> tests.
> >>>
> >>> Anyway, I wonder if a lot of other package developers test for absent
> >>> variables in that first way; if so, they too are in for a rude
> awakening if
> >>> their users provide a tbl_df instead of a data.frame. And what is
> considered
> >>> the best practice for testing absence of a list member? Apparently, not
> >>> either of the above; and because of (c), I want to do these many
> tedious
> >>> corrections only once.
> >>>
> >>> Thanks for any light you can shed.
> >>
> >>
> >> This is why CRAN asks that people test reverse dependencies.
> >
> > Which we did do - the problem is that this is actually caused by a
> > recursive reverse dependency (lsmeans -> dplyr -> tibble), and we
> > didn't correctly anticipate how much pain this would cause.
> >
> >> I think the most defensive thing you can do is to write a small function
> >>
> >> name_missing <- function(x, name)
> >>    !(name %in% names(x))
> >>
> >> and use name_missing(x, "var") in your tests.  (Pick your own name to
> make
> >> your code understandable if you don't like my choice.)
> >>
> >> You could suggest to the tibble maintainers that they add a function
> like
> >> this.
> >
> > We're definitely going to add this.
> >
> > And I think we'll make df[["var"]] return NULL too, so at least
> > there's one easy way to opt out.
> >
> > The motivation for this change was that returning NULL + recycling
> > rules means it's very easy for errors to silently propagate. But I
> > think this approach might be somewhat too aggressive - I hadn't
> > considered that people use `is.null()` to check for missing columns.
> >
> > We'll try and get an update to tibble out soon after useR.  Thoughts
> > on what we should do are greatly appreciated.
> >
> > Hadley
> >
> > --
> > http://hadley.nz
>
> ______________________________________________
> R-package-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list