[R-pkg-devel] Absent variables and tibble

William Dunlap wdunlap at tibco.com
Tue Jun 28 16:03:36 CEST 2016


Currently exists("someName", where=someDataFrame) reports if "someName" is
an column
of the data.frame 'someDataFrame' and the 'where=' may be omitted.  If we
have an
environment we use exsts("someName", envir=someEnvironment).  It might be
nice to
continue using exists() instead of introducing a new function has(),
although, since we
want the same syntax to work for environments, data.frames, tbl_dfs,
data.tables, etc.,
we may need the new function.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Jun 28, 2016 at 4:08 AM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:

> On 27/06/2016 10:15 PM, Lenth, Russell V wrote:
>
>> Hadley's note on partial matching has me scared the most concerning the
>> as.null() coding. So the need for a hasName() (or whatever) function seems
>> all the more compelling, and that it be in base R. Perhaps it should be
>> generic, with a default method that searches in the names attribute,
>> potentially extensible to other classes.
>>
>
> I am thinking of putting it in, but if I do the definition will be
> equivalent to the one-liner down below.  That's already slower than the
> is.null() test; making it generic would slow it down too much.
>
> Duncan Murdoch
>
>
> Thanks so much, several of you, for your positive and helpful responses.
>>
>> Russ
>>
>> -----Original Message-----
>> From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
>> Sent: Monday, June 27, 2016 12:50 PM
>> To: Hadley Wickham <h.wickham at gmail.com>; Lenth, Russell V <
>> russell-lenth at uiowa.edu>
>> Cc: r-package-devel at r-project.org
>> Subject: Re: [R-pkg-devel] Absent variables and tibble
>>
>> On 27/06/2016 1:09 PM, Hadley Wickham wrote:
>>
>>> The other thing you need to be aware of it you're using the other
>>> approach is partial matching:
>>>
>>> df <- data.frame(xyz = 1)
>>> is.null(df$x)
>>> #> [1] FALSE
>>>
>>> Duncan - I think that argues for including a has_name() (hasName() ?)
>>> function in base R. Is that something you'd consider?
>>>
>>
>> Yes, I'd consider it.  I think hasName() would be more consistent with
>> other has*() functions in the R sources.
>>
>> I guess the implementation should be defined to be equivalent to
>>
>> hasName <- function(x, name)
>>    name %in% names(x)
>>
>> though it would make sense to make a faster internal implementation;
>> !is.null(df$x) is quite a bit faster than "x" %in% names(df).
>>
>> Duncan Murdoch
>>
>>
>>
>>> Hadley
>>>
>>> On Mon, Jun 27, 2016 at 10:05 AM, Lenth, Russell V
>>> <russell-lenth at uiowa.edu> wrote:
>>>
>>>> Thanks, Hadley. I do understand why you'd want more careful checking.
>>>>
>>>> If you're going to provide a variable-existing function, may I suggest
>>>> a short name like 'has'? I.e., has(x, var) returns TRUE if x has var in it.
>>>>
>>>> Thanks
>>>>
>>>> Russ
>>>>
>>>> On Jun 27, 2016, at 9:47 AM, Hadley Wickham <h.wickham at gmail.com>
>>>>> wrote:
>>>>>
>>>>> On Mon, Jun 27, 2016 at 9:03 AM, Duncan Murdoch
>>>>> <murdoch.duncan at gmail.com> wrote:
>>>>>
>>>>>> On 27/06/2016 9:22 AM, Lenth, Russell V wrote:
>>>>>>
>>>>>>>
>>>>>>> My package 'lsmeans' is now suddenly broken because of a new
>>>>>>> provision in the 'tibble' package (loaded by 'dplyr' 0.5.0), whereby
>>>>>>> the "[[" and "$"
>>>>>>> methods for 'tbl_df' objects - as documented - throw an error if
>>>>>>> a variable is not found.
>>>>>>>
>>>>>>> The problem is that my code uses tests like this:
>>>>>>>
>>>>>>>        if (is.null (x$var)) {...}
>>>>>>>
>>>>>>> to see whether 'x' has a variable 'var'. Obviously, I can work
>>>>>>> around this using
>>>>>>>
>>>>>>>        if (!("var" %in% names(x))) {...}
>>>>>>>
>>>>>>> but (a) I like the first version better, in terms of the code
>>>>>>> being understandable; and (b) isn't there a long history whereby
>>>>>>> we can expect a NULL result when accessing an absent member of a
>>>>>>> list (and hence a data.frame)? (c) the code base for 'lsmeans'
>>>>>>> has about 50 instances of such tests.
>>>>>>>
>>>>>>> Anyway, I wonder if a lot of other package developers test for
>>>>>>> absent variables in that first way; if so, they too are in for a
>>>>>>> rude awakening if their users provide a tbl_df instead of a
>>>>>>> data.frame. And what is considered the best practice for testing
>>>>>>> absence of a list member? Apparently, not either of the above;
>>>>>>> and because of (c), I want to do these many tedious corrections only
>>>>>>> once.
>>>>>>>
>>>>>>> Thanks for any light you can shed.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> This is why CRAN asks that people test reverse dependencies.
>>>>>>
>>>>>
>>>>> Which we did do - the problem is that this is actually caused by a
>>>>> recursive reverse dependency (lsmeans -> dplyr -> tibble), and we
>>>>> didn't correctly anticipate how much pain this would cause.
>>>>>
>>>>> I think the most defensive thing you can do is to write a small
>>>>>> function
>>>>>>
>>>>>> name_missing <- function(x, name)
>>>>>>    !(name %in% names(x))
>>>>>>
>>>>>> and use name_missing(x, "var") in your tests.  (Pick your own name
>>>>>> to make your code understandable if you don't like my choice.)
>>>>>>
>>>>>> You could suggest to the tibble maintainers that they add a
>>>>>> function like this.
>>>>>>
>>>>>
>>>>> We're definitely going to add this.
>>>>>
>>>>> And I think we'll make df[["var"]] return NULL too, so at least
>>>>> there's one easy way to opt out.
>>>>>
>>>>> The motivation for this change was that returning NULL + recycling
>>>>> rules means it's very easy for errors to silently propagate. But I
>>>>> think this approach might be somewhat too aggressive - I hadn't
>>>>> considered that people use `is.null()` to check for missing columns.
>>>>>
>>>>> We'll try and get an update to tibble out soon after useR.
>>>>> Thoughts on what we should do are greatly appreciated.
>>>>>
>>>>> Hadley
>>>>>
>>>>> --
>>>>> http://hadley.nz
>>>>>
>>>>
>>>
>>>
>>>
>>
> ______________________________________________
> R-package-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list