[Rd] [R] NaN, Inf to NA

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri May 27 17:53:08 CEST 2011


On Fri, 27 May 2011, Duncan Murdoch wrote:

> On 27/05/2011 11:11 AM, Martin Maechler wrote:
>> >>>>>  Duncan Murdoch<murdoch.duncan at gmail.com>
>> >>>>>      on Fri, 27 May 2011 08:23:14 -0400 writes:
>>
>>      >  On 11-05-27 4:27 AM, Albert-Jan Roskam wrote:
>>      >>  Aha! Thank you very much for that clarification! It would
>>      >>  be much more user friendly if R generated a
>>      >>  NotImplementedError or something similar. The 'garbage
>>      >>  results' are pretty misleading, esp. to a novice.
>>
>>      >  I think that's a good idea.  The default methods are
>>      >  documented to work on atomic vectors; dataframes are not
>>      >  atomic vectors, so it would be reasonable to generate an
>>      >  error.  (See ?is.atomic for a definition of atomic
>>      >  vectors.)
>>
>>      >  I'll see if this causes a lot of trouble...
>>
>>      >  Duncan Murdoch
>> 
>> Duncan,
>> do you remember the issue of mean(), var(), median(),... etc
>> that was the topic a few weeks ago ?
>> 
>> I strongly advocated that  mean.data.frame() should become
>> *deprecated*, and I would propose the same for the functions
>> mentioned here.
>
> I think you may have misunderstood my proposal.  Currently is.nan, is.finite 
> and is.infinite have no data.frame methods, so the default method is used. 
> The problem is that the default method is too permissive:  it operates on the 
> data.frame by treating it as a list; then it returns FALSE for each list 
> element.  (If there is only one row, it applies the test to the singleton in 
> the column.)   This is pretty strange default behaviour.
>
> What I'm proposing is that the default method should trigger an error if you 
> try to send it anything that's not atomic.  This gives sensible behaviour in 
> most cases; the only one where it doesn't work is a list of singletons, which 
> used to be handled sensibly, but will now fail.
>
> (There's still a question about what the answer should be for these functions 
> when applied to character or raw vectors, which are both atomic.  I'm leaning 
> towards returning FALSE for every element, which matches the current 
> behaviour, but perhaps those should also generate an error.)

I noticed you did not mention integer vectors.  Those are no 
different from character or raw: there are no NaN (nor infinite) 
integer elements.  I don't see it should be an error to ask in those 
cases.

>
> I think this partially addresses Bill's objection, but not completely. 
> Someone could still put a class on an atomic vector, and that might not be 
> handled properly by the default method.
>
>> People should  *apply (or *ply) on data frames, and not expect
>> that all kind of functions have data.frame methods
>> which are simply equivalent to basically  sapply(<df>,<function>)
>> 
>> {and yes -- all this belongs to R-devel rather than R-help}
>
> Where I've moved it now.
>
> Duncan Murdoch
>> Martin
>>
>>      >>  I wanted to recode every NaN and Inf value of an entire
>>      >>  data.frame to NA. The data.frame also includes character
>>      >>  variables. So the following might work (?)  (Can't test
>>      >>  it here)
>>      >>
>>      >>  ditch<- function(x) ifelse(is.infinite(x) | is.nan(x),
>>      >>  NA, x) df<- apply(df, 2, ditch)
>>      >>
>>      >>
>>      >>
>>      >>
>>      >>
>>      >>  ________________________________ From: William
>>      >>  Dunlap<wdunlap at tibco.com>
>>      >>
>>      >>  Cc: R Mailing List<r-help at r-project.org>  Sent: Fri, May
>>      >>  27, 2011 12:57:01 AM Subject: RE: [R] NaN, Inf to NA
>>      >>
>>      >>  I think the source of the OP's problem is that while
>>      >>  things like df>30 and is.na(df) return a logical matrix
>>      >>  with the dimensions of the data.frame df, both
>>      >>  is.infinite(df) and is.nan(df) return a logical vector as
>>      >>  long as the number of columns of df.  (`>` and is.na have
>>      >>  data.frame methods but is.infinite and is.nan do not: the
>>      >>  latter give garbage results for data.frames.)
>>      >>
>>      >>  Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
>>      >>
>>      >>>  -----Original Message----- From:
>>      >>>  r-help-bounces at r-project.org
>>      >>>  [mailto:r-help-bounces at r-project.org] On Behalf Of Marc
>>      >>>  Schwartz Sent: Thursday, May 26, 2011 2:15 PM To:
>>      >>>  Albert-Jan Roskam Cc: R Mailing List Subject: Re: [R]
>>      >>>  NaN, Inf to NA
>>      >>>
>>      >>>  On May 26, 2011, at 3:18 PM, Albert-Jan Roskam wrote:
>>      >>>
>>      >>>>  Hi,
>>      >>>>
>>      >>>>  I want to recode all Inf and NaN values to NA, but I;m
>>      >>>  surprised to see the
>>      >>>>  result of the following code. Could anybody enlighten
>>      >>>>  me
>>      >>>  about this?
>>      >>>>
>>      >>>>>  df<- data.frame(a=c(NA, NaN, Inf, 1:3))
>>      >>>>>  df[is.infinite(df) | is.nan(df)]<- NA df
>>      >>>>  a 1 NA 2 NaN 3 Inf 4 1 5 2 6 3
>>      >>>>>
>>      >>>>
>> >>>
>> >>>  Thanks!
>>      >>>>
>>      >>>>  Cheers!!  Albert-Jan
>>      >>>
>>      >>>
>>      >>>  The canonical way is to use is.na() to assign the NA
>>      >>>  value based upon a condition. See ?is.na for more
>>      >>>  information.
>>      >>>
>>      >>>  is.na(df$a)<- !is.finite(df$a)
>>      >>>
>>      >>>>  df
>>      >>>  a 1 NA 2 NA 3 NA 4 1 5 2 6 3
>>      >>>
>>      >>>
>>      >>>  HTH,
>>      >>>
>>      >>>  Marc Schwartz
>>      >>>
>>      >>>  ______________________________________________
>>      >>>  R-help at r-project.org mailing list
>>      >>>  https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>>      >>>  read the posting guide
>>      >>>  http://www.R-project.org/posting-guide.html and provide
>>      >>>  commented, minimal, self-contained, reproducible code.
>>      >>>
>>      >>
>> >  [[alternative HTML version deleted]]
>>      >>
>>      >>  ______________________________________________
>>      >>  R-help at r-project.org mailing list
>>      >>  https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>>      >>  read the posting guide
>>      >>  http://www.R-project.org/posting-guide.html and provide
>>      >>  commented, minimal, self-contained, reproducible code.
>>
>>      >  ______________________________________________
>>      >  R-help at r-project.org mailing list
>>      >  https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>>      >  read the posting guide
>>      >  http://www.R-project.org/posting-guide.html and provide
>>      >  commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list