Duncan Murdoch
murdoch.duncan at gmail.com
Fri May 27 18:37:22 CEST 2011
On 27/05/2011 11:53 AM, Prof Brian Ripley wrote:
> On Fri, 27 May 2011, Duncan Murdoch wrote:
>
> > On 27/05/2011 11:11 AM, Martin Maechler wrote:
> >> >>>>> Duncan Murdoch<murdoch.duncan at gmail.com>
> >> >>>>> on Fri, 27 May 2011 08:23:14 -0400 writes:
> >>
> >> > On 11-05-27 4:27 AM, Albert-Jan Roskam wrote:
> >> >> Aha! Thank you very much for that clarification! It would
> >> >> be much more user friendly if R generated a
> >> >> NotImplementedError or something similar. The 'garbage
> >> >> results' are pretty misleading, esp. to a novice.
> >>
> >> > I think that's a good idea. The default methods are
> >> > documented to work on atomic vectors; dataframes are not
> >> > atomic vectors, so it would be reasonable to generate an
> >> > error. (See ?is.atomic for a definition of atomic
> >> > vectors.)
> >>
> >> > I'll see if this causes a lot of trouble...
> >>
> >> > Duncan Murdoch
> >>
> >> Duncan,
> >> do you remember the issue of mean(), var(), median(),... etc
> >> that was the topic a few weeks ago ?
> >>
> >> I strongly advocated that mean.data.frame() should become
> >> *deprecated*, and I would propose the same for the functions
> >> mentioned here.
> >
> > I think you may have misunderstood my proposal. Currently is.nan, is.finite
> > and is.infinite have no data.frame methods, so the default method is used.
> > The problem is that the default method is too permissive: it operates on the
> > data.frame by treating it as a list; then it returns FALSE for each list
> > element. (If there is only one row, it applies the test to the singleton in
> > the column.) This is pretty strange default behaviour.
> >
> > What I'm proposing is that the default method should trigger an error if you
> > try to send it anything that's not atomic. This gives sensible behaviour in
> > most cases; the only one where it doesn't work is a list of singletons, which
> > used to be handled sensibly, but will now fail.
> >
> > (There's still a question about what the answer should be for these functions
> > when applied to character or raw vectors, which are both atomic. I'm leaning
> > towards returning FALSE for every element, which matches the current
> > behaviour, but perhaps those should also generate an error.)
>
> I noticed you did not mention integer vectors. Those are no
> different from character or raw: there are no NaN (nor infinite)
> integer elements. I don't see it should be an error to ask in those
> cases.
Right, in those cases I think it's clear that is.finite should return
TRUE for every element except NA_integer_, and is.infinite and is.nan
should return FALSE. I would treat logical the same way since we often
promote logical to integer in a calculation.
Duncan Murdoch
> >
> > I think this partially addresses Bill's objection, but not completely.
> > Someone could still put a class on an atomic vector, and that might not be
> > handled properly by the default method.
> >
> >> People should *apply (or *ply) on data frames, and not expect
> >> that all kind of functions have data.frame methods
> >> which are simply equivalent to basically sapply(<df>,<function>)
> >>
> >> {and yes -- all this belongs to R-devel rather than R-help}
> >
> > Where I've moved it now.
> >
> > Duncan Murdoch
> >> Martin
> >>
> >> >> I wanted to recode every NaN and Inf value of an entire
> >> >> data.frame to NA. The data.frame also includes character
> >> >> variables. So the following might work (?) (Can't test
> >> >> it here)
> >> >>
> >> >> ditch<- function(x) ifelse(is.infinite(x) | is.nan(x),
> >> >> NA, x) df<- apply(df, 2, ditch)
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> ________________________________ From: William
> >> >> Dunlap<wdunlap at tibco.com>
> >> >>
> >> >> Cc: R Mailing List<r-help at r-project.org> Sent: Fri, May
> >> >> 27, 2011 12:57:01 AM Subject: RE: [R] NaN, Inf to NA
> >> >>
> >> >> I think the source of the OP's problem is that while
> >> >> things like df>30 and is.na(df) return a logical matrix
> >> >> with the dimensions of the data.frame df, both
> >> >> is.infinite(df) and is.nan(df) return a logical vector as
> >> >> long as the number of columns of df. (`>` and is.na have
> >> >> data.frame methods but is.infinite and is.nan do not: the
> >> >> latter give garbage results for data.frames.)
> >> >>
> >> >> Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
> >> >>
> >> >>> -----Original Message----- From:
> >> >>> r-help-bounces at r-project.org
> >> >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Marc
> >> >>> Schwartz Sent: Thursday, May 26, 2011 2:15 PM To:
> >> >>> Albert-Jan Roskam Cc: R Mailing List Subject: Re: [R]
> >> >>> NaN, Inf to NA
> >> >>>
> >> >>> On May 26, 2011, at 3:18 PM, Albert-Jan Roskam wrote:
> >> >>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> I want to recode all Inf and NaN values to NA, but I;m
> >> >>> surprised to see the
> >> >>>> result of the following code. Could anybody enlighten
> >> >>>> me
> >> >>> about this?
> >> >>>>
> >> >>>>> df<- data.frame(a=c(NA, NaN, Inf, 1:3))
> >> >>>>> df[is.infinite(df) | is.nan(df)]<- NA df
> >> >>>> a 1 NA 2 NaN 3 Inf 4 1 5 2 6 3
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>> Thanks!
> >> >>>>
> >> >>>> Cheers!! Albert-Jan
> >> >>>
> >> >>>
> >> >>> The canonical way is to use is.na() to assign the NA
> >> >>> value based upon a condition. See ?is.na for more
> >> >>> information.
> >> >>>
> >> >>> is.na(df$a)<- !is.finite(df$a)
> >> >>>
> >> >>>> df
> >> >>> a 1 NA 2 NA 3 NA 4 1 5 2 6 3
> >> >>>
> >> >>>
> >> >>> HTH,
> >> >>>
> >> >>> Marc Schwartz
> >> >>>
[alternative HTML version deleted]
> >> >>
> >>
> >
> >
>
