[R-pkg-devel] Absent variables and tibble
Lenth, Russell V
russell-lenth at uiowa.edu
Mon Jun 27 17:05:37 CEST 2016
Thanks, Hadley. I do understand why you'd want more careful checking.
If you're going to provide a variable-existing function, may I suggest a short name like 'has'? I.e., has(x, var) returns TRUE if x has var in it.
Thanks
Russ
> On Jun 27, 2016, at 9:47 AM, Hadley Wickham <h.wickham at gmail.com> wrote:
>
> On Mon, Jun 27, 2016 at 9:03 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
>> On 27/06/2016 9:22 AM, Lenth, Russell V wrote:
>>>
>>> My package 'lsmeans' is now suddenly broken because of a new provision in
>>> the 'tibble' package (loaded by 'dplyr' 0.5.0), whereby the "[[" and "$"
>>> methods for 'tbl_df' objects - as documented - throw an error if a variable
>>> is not found.
>>>
>>> The problem is that my code uses tests like this:
>>>
>>> if (is.null (x$var)) {...}
>>>
>>> to see whether 'x' has a variable 'var'. Obviously, I can work around this
>>> using
>>>
>>> if (!("var" %in% names(x))) {...}
>>>
>>> but (a) I like the first version better, in terms of the code being
>>> understandable; and (b) isn't there a long history whereby we can expect a
>>> NULL result when accessing an absent member of a list (and hence a
>>> data.frame)? (c) the code base for 'lsmeans' has about 50 instances of such
>>> tests.
>>>
>>> Anyway, I wonder if a lot of other package developers test for absent
>>> variables in that first way; if so, they too are in for a rude awakening if
>>> their users provide a tbl_df instead of a data.frame. And what is considered
>>> the best practice for testing absence of a list member? Apparently, not
>>> either of the above; and because of (c), I want to do these many tedious
>>> corrections only once.
>>>
>>> Thanks for any light you can shed.
>>
>>
>> This is why CRAN asks that people test reverse dependencies.
>
> Which we did do - the problem is that this is actually caused by a
> recursive reverse dependency (lsmeans -> dplyr -> tibble), and we
> didn't correctly anticipate how much pain this would cause.
>
>> I think the most defensive thing you can do is to write a small function
>>
>> name_missing <- function(x, name)
>> !(name %in% names(x))
>>
>> and use name_missing(x, "var") in your tests. (Pick your own name to make
>> your code understandable if you don't like my choice.)
>>
>> You could suggest to the tibble maintainers that they add a function like
>> this.
>
> We're definitely going to add this.
>
> And I think we'll make df[["var"]] return NULL too, so at least
> there's one easy way to opt out.
>
> The motivation for this change was that returning NULL + recycling
> rules means it's very easy for errors to silently propagate. But I
> think this approach might be somewhat too aggressive - I hadn't
> considered that people use `is.null()` to check for missing columns.
>
> We'll try and get an update to tibble out soon after useR. Thoughts
> on what we should do are greatly appreciated.
>
> Hadley
>
> --
> http://hadley.nz
More information about the R-package-devel
mailing list