[R-pkg-devel] Absent variables and tibble

Mon Jun 27 16:46:45 CEST 2016

On Mon, Jun 27, 2016 at 9:03 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 27/06/2016 9:22 AM, Lenth, Russell V wrote:
>>
>> My package 'lsmeans' is now suddenly broken because of a new provision in
>> the 'tibble' package (loaded by 'dplyr' 0.5.0), whereby the "[[" and "$"
>> methods for 'tbl_df' objects - as documented - throw an error if a variable
>> is not found.
>>
>> The problem is that my code uses tests like this:
>>
>>         if (is.null (x$var)) {...}
>>
>> to see whether 'x' has a variable 'var'. Obviously, I can work around this
>> using
>>
>>         if (!("var" %in% names(x))) {...}
>>
>> but (a) I like the first version better, in terms of the code being
>> understandable; and (b) isn't there a long history whereby we can expect a
>> NULL result when accessing an absent member of a list (and hence a
>> data.frame)? (c) the code base for 'lsmeans' has about 50 instances of such
>> tests.
>>
>> Anyway, I wonder if a lot of other package developers test for absent
>> variables in that first way; if so, they too are in for a rude awakening if
>> their users provide a tbl_df instead of a data.frame. And what is considered
>> the best practice for testing absence of a list member? Apparently, not
>> either of the above; and because of (c), I want to do these many tedious
>> corrections only once.
>>
>> Thanks for any light you can shed.
>
>
> This is why CRAN asks that people test reverse dependencies.

Which we did do - the problem is that this is actually caused by a
recursive reverse dependency (lsmeans -> dplyr -> tibble), and we
didn't correctly anticipate how much pain this would cause.

> I think the most defensive thing you can do is to write a small function
>
> name_missing <- function(x, name)
>     !(name %in% names(x))
>
> and use name_missing(x, "var") in your tests.  (Pick your own name to make
> your code understandable if you don't like my choice.)
>
> You could suggest to the tibble maintainers that they add a function like
> this.

We're definitely going to add this.

And I think we'll make df[["var"]] return NULL too, so at least
there's one easy way to opt out.

The motivation for this change was that returning NULL + recycling
rules means it's very easy for errors to silently propagate. But I
think this approach might be somewhat too aggressive - I hadn't
considered that people use `is.null()` to check for missing columns.

We'll try and get an update to tibble out soon after useR.  Thoughts
on what we should do are greatly appreciated.

Hadley

-- 
http://hadley.nz