[Rd] Bounty on Error Checking

Matthew Dowle mdowle at mdowle.plus.com
Fri Jan 4 16:38:42 CET 2013


On 04.01.2013 15:22, Duncan Murdoch wrote:
> On 04/01/2013 10:15 AM, Matthew Dowle wrote:
>> On 04.01.2013 14:56, Duncan Murdoch wrote:
>> > On 04/01/2013 9:51 AM, Matthew Dowle wrote:
>> >> On 04.01.2013 14:03, Duncan Murdoch wrote:
>> >> > On 13-01-04 8:32 AM, Matthew Dowle wrote:
>> >> >>
>> >> >> On Fri, Jan 3, 2013, Bert Gunter wrote
>> >> >>> Well...
>> >> >>>
>> >> >>> On Thu, Jan 3, 2013 at 10:00 AM, ivo welch <ivo.welch <at>
>> >> >>> anderson.ucla.edu> wrote:
>> >> >>>>
>> >> >>>> Dear R developers---I just spent half a day debugging an R
>> >> >>>> program,
>> >> >>>> which had two bugs---I selected the wrongly named variable,
>> >> which
>> >> >>>> turns out to have been a scalar, which then happily 
>> multiplied
>> >> as
>> >> >>>> if
>> >> >>>> it was a matrix; and another wrongly named variable from a 
>> data
>> >> >>>> frame,
>> >> >>>> that triggered no error when used as a[["name"]] or a$name .
>> >> >>>> there
>> >> >>>> should be an option to turn on that throws an error inside R
>> >> when
>> >> >>>> one
>> >> >>>> does this.  I cannot imagine that there is much code that 
>> wants
>> >> to
>> >> >>>> reference non-existing columns in data frames.
>> >> >>>
>> >> >>> But I can -- and do it all the time: To add a new variable, 
>> "d"
>> >> to
>> >> >>> a
>> >> >>> data frame, df,  containing only "a" and "b" (with 10 rows,
>> >> say):
>> >> >>>
>> >> >>> df[["d"]] <- 1:10
>> >> >>
>> >> >> Yes but that's `[[<-`. Ivo was talking about `[[` and `$`; 
>> i.e.,
>> >> >> select
>> >> >> only not assign, if I understood correctly.
>> >> >>
>> >> >>>
>> >> >>> Trying to outguess documentation to create error triggers is 
>> a
>> >> very
>> >> >>> bad idea.
>> >> >>
>> >> >> Why exactly is it a very bad idea? (I don't necessarily 
>> disagree,
>> >> >> just
>> >> >> asking
>> >> >> for more colour.)
>> >> >>
>> >> >>> R already has plenty of debugging tools -- and there is even 
>> a
>> >> >>> "debug"
>> >> >>> package. Perhaps you need a better programming editor/IDE. 
>> There
>> >> >>> are
>> >> >>> several listed on CRAN, RStudio, etc.
>> >> >>
>> >> >> True, but that relies on you knowing there's a bug to hunt 
>> for.
>> >> What
>> >> >> if
>> >> >> you
>> >> >> don't know you're getting incorrect results, silently? In a
>> >> similar
>> >> >> way
>> >> >> that options(warn=2) turns known warnings into errors, to 
>> enable
>> >> you
>> >> >> to
>> >> >> be
>> >> >> more strict if you wish,
>> >> >
>> >> > I would say the point of options(warn=2) is rather to let you 
>> find
>> >> > the location of the warning more easily, because it will abort 
>> the
>> >> > evaluation.
>> >>
>> >> True but as well as that, I sometimes like to run production 
>> systems
>> >> with
>> >> options(warn=2). I'd prefer some tasks to halt at the slightest 
>> hint
>> >> of
>> >> trouble than write a warning silently to a log file that may not 
>> be
>> >> looked
>> >> at. I think of that as being more strict, more robust. Since
>> >> option(warn=2)
>> >> is set even when there is no warning, to catch if one arises in
>> >> future.
>> >> Not
>> >> just to find it more easily once you know there is a warning.
>> >>
>> >> > I would not recommend using code that issues warnings.
>> >>
>> >> Not sure what you mean here.
>> >
>> > I just meant that I consider warnings to be a problem (as you do), 
>> so
>> > they should all be fixed.
>>
>> I see now, good.
>>
>> >
>> >>
>> >> >
>> >> > an option to turn on warnings from `[[` and
>> >> >> `$`
>> >> >> if the column is missing (select only, not assign) doesn't 
>> seem
>> >> like
>> >> >> a
>> >> >> bad option to have. Maybe it would reveal some previously 
>> silent
>> >> >> bugs.
>> >> >
>> >> > I agree that this would sometimes be useful, but a very common
>> >> > convention is to do something like
>> >> >
>> >> > if (is.null(obj$element)) {  do something }
>> >> >
>> >> > These would all have to be re-written to something like
>> >> >
>> >> > if (missing.field(obj, "element") { do something }
>> >> >
>> >> > There are several hundred examples of the first usage in base 
>> R; I
>> >> > imagine thousands more in contributed packages.
>> >>
>> >> Yes but Ivo doesn't seem to be writing that if() in his code. 
>> We're
>> >> only talking about an option that users can turn on for their own
>> >> code, iiuc. Not anything that would affect or break thousands of
>> >> packages. That's why I referred to the fact that all packages now
>> >> have namespaces, in the earlier post.
>> >>
>> >> > I don't think the
>> >> > benefit of the change is worth all the work that would be
>> >> necessary
>> >> > to
>> >> > implement it.
>> >>
>> >> It doesn't seem to be a lot of work. I already posted a working
>> >> straw man, for example, as a first step.
>> >
>> > I understood the proposal to be that evaluating "obj$element" 
>> would
>> > issue a warning if element didn't exist.  If that were the case, 
>> then
>> > the common test
>> >
>> > is.null(obj$element)
>> >
>> > would issue a warning in the cases where it now returns TRUE.
>>
>> Yes, but only for obj$element appearing in Ivo's own code. Not if a
>> package
>> does that (including base). That's why I thought masking "[[<-" and
>> "$<-"
>> in .GlobalEnv might achieve that without affecting packages or base,
>> although
>> I don't know how such an option could be made available by R.
>> Maybe options(strictselect=TRUE) would create those masks in
>> .GlobalEnv,
>> and options(strictselect=FALSE) would remove them. A package 
>> maintainer
>> might choose to set that in their package to make it stricter (which
>> would
>> create those masks in the package's namespace too).
>>
>> Or users could just create those masks themselves, since it's only a
>> few
>> lines. Without affecting packages or base.
>
> options() are global

I realise that. I was thinking that inside the options() function it
could see if strictselect was being changed and then create the masks
in .GlobalEnv. But I can see that is ugly, was just thinking out loud.
Wasn't suggesting that "[[" would look at the value of strictselect.

> but a package could change the meaning of $ or
> [[.  It could even export those new definitions so that people who
> wanted the strict usage could use it.  It would be hard to get the
> same performance as the base definitions, but for debugging purposes
> that might not matter.

So in principle this would be a (small) good idea then?  Is it an
option that R could provide? i.e. something for which a patch file
for R would be considered by R core?

Matthew



More information about the R-devel mailing list