[R-pkg-devel] Absent variables and tibble

Duncan Murdoch murdoch.duncan at gmail.com
Mon Jun 27 19:50:28 CEST 2016


On 27/06/2016 1:09 PM, Hadley Wickham wrote:
> The other thing you need to be aware of it you're using the other
> approach is partial matching:
>
> df <- data.frame(xyz = 1)
> is.null(df$x)
> #> [1] FALSE
>
> Duncan - I think that argues for including a has_name() (hasName() ?)
> function in base R. Is that something you'd consider?

Yes, I'd consider it.  I think hasName() would be more consistent with 
other has*() functions in the R sources.

I guess the implementation should be defined to be equivalent to

hasName <- function(x, name)
   name %in% names(x)

though it would make sense to make a faster internal implementation; 
!is.null(df$x) is quite a bit faster than "x" %in% names(df).

Duncan Murdoch


>
> Hadley
>
> On Mon, Jun 27, 2016 at 10:05 AM, Lenth, Russell V
> <russell-lenth at uiowa.edu> wrote:
> > Thanks, Hadley. I do understand why you'd want more careful checking.
> >
> > If you're going to provide a variable-existing function, may I suggest a short name like 'has'? I.e., has(x, var) returns TRUE if x has var in it.
> >
> > Thanks
> >
> > Russ
> >
> >> On Jun 27, 2016, at 9:47 AM, Hadley Wickham <h.wickham at gmail.com> wrote:
> >>
> >> On Mon, Jun 27, 2016 at 9:03 AM, Duncan Murdoch
> >> <murdoch.duncan at gmail.com> wrote:
> >>> On 27/06/2016 9:22 AM, Lenth, Russell V wrote:
> >>>>
> >>>> My package 'lsmeans' is now suddenly broken because of a new provision in
> >>>> the 'tibble' package (loaded by 'dplyr' 0.5.0), whereby the "[[" and "$"
> >>>> methods for 'tbl_df' objects - as documented - throw an error if a variable
> >>>> is not found.
> >>>>
> >>>> The problem is that my code uses tests like this:
> >>>>
> >>>>        if (is.null (x$var)) {...}
> >>>>
> >>>> to see whether 'x' has a variable 'var'. Obviously, I can work around this
> >>>> using
> >>>>
> >>>>        if (!("var" %in% names(x))) {...}
> >>>>
> >>>> but (a) I like the first version better, in terms of the code being
> >>>> understandable; and (b) isn't there a long history whereby we can expect a
> >>>> NULL result when accessing an absent member of a list (and hence a
> >>>> data.frame)? (c) the code base for 'lsmeans' has about 50 instances of such
> >>>> tests.
> >>>>
> >>>> Anyway, I wonder if a lot of other package developers test for absent
> >>>> variables in that first way; if so, they too are in for a rude awakening if
> >>>> their users provide a tbl_df instead of a data.frame. And what is considered
> >>>> the best practice for testing absence of a list member? Apparently, not
> >>>> either of the above; and because of (c), I want to do these many tedious
> >>>> corrections only once.
> >>>>
> >>>> Thanks for any light you can shed.
> >>>
> >>>
> >>> This is why CRAN asks that people test reverse dependencies.
> >>
> >> Which we did do - the problem is that this is actually caused by a
> >> recursive reverse dependency (lsmeans -> dplyr -> tibble), and we
> >> didn't correctly anticipate how much pain this would cause.
> >>
> >>> I think the most defensive thing you can do is to write a small function
> >>>
> >>> name_missing <- function(x, name)
> >>>    !(name %in% names(x))
> >>>
> >>> and use name_missing(x, "var") in your tests.  (Pick your own name to make
> >>> your code understandable if you don't like my choice.)
> >>>
> >>> You could suggest to the tibble maintainers that they add a function like
> >>> this.
> >>
> >> We're definitely going to add this.
> >>
> >> And I think we'll make df[["var"]] return NULL too, so at least
> >> there's one easy way to opt out.
> >>
> >> The motivation for this change was that returning NULL + recycling
> >> rules means it's very easy for errors to silently propagate. But I
> >> think this approach might be somewhat too aggressive - I hadn't
> >> considered that people use `is.null()` to check for missing columns.
> >>
> >> We'll try and get an update to tibble out soon after useR.  Thoughts
> >> on what we should do are greatly appreciated.
> >>
> >> Hadley
> >>
> >> --
> >> http://hadley.nz
>
>
>



More information about the R-package-devel mailing list