[Rd] [R] NaN, Inf to NA
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri May 27 17:53:08 CEST 2011
On Fri, 27 May 2011, Duncan Murdoch wrote:
> On 27/05/2011 11:11 AM, Martin Maechler wrote:
>> >>>>> Duncan Murdoch<murdoch.duncan at gmail.com>
>> >>>>> on Fri, 27 May 2011 08:23:14 -0400 writes:
>>
>> > On 11-05-27 4:27 AM, Albert-Jan Roskam wrote:
>> >> Aha! Thank you very much for that clarification! It would
>> >> be much more user friendly if R generated a
>> >> NotImplementedError or something similar. The 'garbage
>> >> results' are pretty misleading, esp. to a novice.
>>
>> > I think that's a good idea. The default methods are
>> > documented to work on atomic vectors; dataframes are not
>> > atomic vectors, so it would be reasonable to generate an
>> > error. (See ?is.atomic for a definition of atomic
>> > vectors.)
>>
>> > I'll see if this causes a lot of trouble...
>>
>> > Duncan Murdoch
>>
>> Duncan,
>> do you remember the issue of mean(), var(), median(),... etc
>> that was the topic a few weeks ago ?
>>
>> I strongly advocated that mean.data.frame() should become
>> *deprecated*, and I would propose the same for the functions
>> mentioned here.
>
> I think you may have misunderstood my proposal. Currently is.nan, is.finite
> and is.infinite have no data.frame methods, so the default method is used.
> The problem is that the default method is too permissive: it operates on the
> data.frame by treating it as a list; then it returns FALSE for each list
> element. (If there is only one row, it applies the test to the singleton in
> the column.) This is pretty strange default behaviour.
>
> What I'm proposing is that the default method should trigger an error if you
> try to send it anything that's not atomic. This gives sensible behaviour in
> most cases; the only one where it doesn't work is a list of singletons, which
> used to be handled sensibly, but will now fail.
>
> (There's still a question about what the answer should be for these functions
> when applied to character or raw vectors, which are both atomic. I'm leaning
> towards returning FALSE for every element, which matches the current
> behaviour, but perhaps those should also generate an error.)
I noticed you did not mention integer vectors. Those are no
different from character or raw: there are no NaN (nor infinite)
integer elements. I don't see it should be an error to ask in those
cases.
>
> I think this partially addresses Bill's objection, but not completely.
> Someone could still put a class on an atomic vector, and that might not be
> handled properly by the default method.
>
>> People should *apply (or *ply) on data frames, and not expect
>> that all kind of functions have data.frame methods
>> which are simply equivalent to basically sapply(<df>,<function>)
>>
>> {and yes -- all this belongs to R-devel rather than R-help}
>
> Where I've moved it now.
>
> Duncan Murdoch
>> Martin
>>
>> >> I wanted to recode every NaN and Inf value of an entire
>> >> data.frame to NA. The data.frame also includes character
>> >> variables. So the following might work (?) (Can't test
>> >> it here)
>> >>
>> >> ditch<- function(x) ifelse(is.infinite(x) | is.nan(x),
>> >> NA, x) df<- apply(df, 2, ditch)
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> ________________________________ From: William
>> >> Dunlap<wdunlap at tibco.com>
>> >>
>> >> Cc: R Mailing List<r-help at r-project.org> Sent: Fri, May
>> >> 27, 2011 12:57:01 AM Subject: RE: [R] NaN, Inf to NA
>> >>
>> >> I think the source of the OP's problem is that while
>> >> things like df>30 and is.na(df) return a logical matrix
>> >> with the dimensions of the data.frame df, both
>> >> is.infinite(df) and is.nan(df) return a logical vector as
>> >> long as the number of columns of df. (`>` and is.na have
>> >> data.frame methods but is.infinite and is.nan do not: the
>> >> latter give garbage results for data.frames.)
>> >>
>> >> Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
>> >>
>> >>> -----Original Message----- From:
>> >>> r-help-bounces at r-project.org
>> >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Marc
>> >>> Schwartz Sent: Thursday, May 26, 2011 2:15 PM To:
>> >>> Albert-Jan Roskam Cc: R Mailing List Subject: Re: [R]
>> >>> NaN, Inf to NA
>> >>>
>> >>> On May 26, 2011, at 3:18 PM, Albert-Jan Roskam wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I want to recode all Inf and NaN values to NA, but I;m
>> >>> surprised to see the
>> >>>> result of the following code. Could anybody enlighten
>> >>>> me
>> >>> about this?
>> >>>>
>> >>>>> df<- data.frame(a=c(NA, NaN, Inf, 1:3))
>> >>>>> df[is.infinite(df) | is.nan(df)]<- NA df
>> >>>> a 1 NA 2 NaN 3 Inf 4 1 5 2 6 3
>> >>>>>
>> >>>>
>> >>>
>> >>> Thanks!
>> >>>>
>> >>>> Cheers!! Albert-Jan
>> >>>
>> >>>
>> >>> The canonical way is to use is.na() to assign the NA
>> >>> value based upon a condition. See ?is.na for more
>> >>> information.
>> >>>
>> >>> is.na(df$a)<- !is.finite(df$a)
>> >>>
>> >>>> df
>> >>> a 1 NA 2 NA 3 NA 4 1 5 2 6 3
>> >>>
>> >>>
>> >>> HTH,
>> >>>
>> >>> Marc Schwartz
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> >>> read the posting guide
>> >>> http://www.R-project.org/posting-guide.html and provide
>> >>> commented, minimal, self-contained, reproducible code.
>> >>>
>> >>
>> > [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> >> read the posting guide
>> >> http://www.R-project.org/posting-guide.html and provide
>> >> commented, minimal, self-contained, reproducible code.
>>
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> > read the posting guide
>> > http://www.R-project.org/posting-guide.html and provide
>> > commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list