[Rd] [R] Why does R replace all row values with NAs
Hervé Pagès
hpages at fredhutch.org
Tue Mar 3 23:26:25 CET 2015
On 03/03/2015 02:17 PM, Gabriel Becker wrote:
> Stephanie,
>
> Actually, it's as.logical that isn't preserving matrix dimensions,
> because it coerces to a logical vector:
>
> > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
> > dim(as.logical(x))
It's true, as.logical() doesn't help here but Stephanie is right, %in%
does not preserve the dimensions either:
> dim(x %in% 1:5)
NULL
That's because match() itself doesn't preserve the dimensions:
> dim(match(x, 1:5))
NULL
So maybe my fast is.true() should be:
is.true <- function(x)
{
ans <- as.logical(x) %in% TRUE
if (is.null(dim(x))) {
names(ans) <- names(x)
} else {
dim(ans) <- dim(x)
dimnames(ans) <- dimnames(x)
}
ans
}
or something like that...
H.
> NULL
>
> ~G
>
> On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten
> <sdmorris at u.washington.edu <mailto:sdmorris at u.washington.edu>> wrote:
>
>
>
> On 3/3/15 1:26 PM, Hervé Pagès wrote:
>
>
>
> On 03/03/2015 02:28 AM, Martin Maechler wrote:
>
> Diverted from R-help :
> .... as it gets into musing about new R language "primitives"
>
> William Dunlap <wdunlap at tibco.com
> <mailto:wdunlap at tibco.com>>
> on Fri, 27 Feb 2015 08:04:36 -0800
> writes:
>
>
> > You could define functions like
>
> > is.true <- function(x) !is.na <http://is.na>(x) & x
> > is.false <- function(x) !is.na <http://is.na>(x) & !x
>
> > and use them in your selections. E.g.,
> >> x <-
> data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10))
> >> x[is.true(x$c >= 6), ]
> > a b c
> > 7 7 8 7
> > 10 10 11 10
>
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com <http://tibco.com>
>
> Yes; the Matrix package has had these
>
> is0 <- function(x) !is.na <http://is.na>(x) & x == 0
> isN0 <- function(x) is.na <http://is.na>(x) | x != 0
> is1 <- function(x) !is.na <http://is.na>(x) & x # also ==
> "isTRUE componentwise"
>
>
> Note that using %in% to block propagation of NAs is about 2x faster:
>
> > x <- sample(c(NA_integer_, 1:10000), 500000, replace=TRUE)
> > microbenchmark(as.logical(x) %in% TRUE, !is.na
> <http://is.na>(x) & x)
> Unit: milliseconds
> expr min lq mean
> median uq
> as.logical(x) %in% TRUE 6.034744 6.264382 6.999083
> 6.29488 6.346028
> !is.na <http://is.na>(x) & x 11.202808 11.402437
> 11.469101 11.44848 11.517576
> max neval
> 40.36472 100 <tel:40.36472%20%20%20100>
> 11.90916 100
>
>
> Unfortunately %in% does not preserve matrix dimensions:
>
> > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE),
> nrow=50)
> > dim(x)
> [1] 50 10
> > dim(!is.na <http://is.na>(x) & x)
> [1] 50 10
> > dim(as.logical(x) %in% TRUE)
> NULL
>
> Stephanie
>
>
>
>
>
>
> namespace hidden for a while [note the comment of the last
> one!]
> and using them for readibility in its own code.
>
> Maybe we should (again) consider providing some versions of
> these with R ?
>
> The Matrix package also has had fast
>
> allFalse <- all0 <- function(x) .Call(R_all0, x)
> anyFalse <- any0 <- function(x) .Call(R_any0, x)
> ##
> ## anyFalse <- function(x) isTRUE(any(!x)) ## ~= any0
> ## any0 <- function(x) isTRUE(any(x == 0)) ## ~=
> anyFalse
>
> namespace hidden as well, already, which probably could also be
> brought to base R.
>
> One big reason to *not* go there (to internal C code) at all
> with R is
> that
> S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group
> generics)
> and 'is.na <http://is.na>() have been known and package
> writers have
> programmed methods for these.
> To ensure that S3 and S4 dispatch works "correctly" also inside
> such new internals is much less easily achieved, and so
> such a C-based internal function is0() would no longer be
> equivalent with !is.na <http://is.na>(x) & x == 0
> as soon as 'x' is an "object" with a '==', 'Compare' and/or
> an is.na <http://is.na>()
> method.
>
>
> Excellent point. Thank you! It really makes a big difference for
> developers who maintain a complex hierarchy of S4 classes and
> methods,
> when functions like is.true, anyFalse, etc..., which can be
> expressed in
> terms of more basic operations like ==, !=, !, is.na
> <http://is.na>, etc..., just work
> out-of-the-box on objects for which these basic operations are
> defined.
>
> There is conceptually a small set of "building blocks", at least for
> objects with a vector-like or list-like semantic, that can be used
> to formally describe the semantic of many functions in base R. This
> is what the man page for anyNA does by saying:
>
> anyNA implements any(is.na <http://is.na>(x))
>
> even though the actual implementation differs, but that's ok, as
> long
> as anyNA is equivalent to doing any(is.na <http://is.na>(x)) on
> any object for which
> building block is.na <http://is.na>() is implemented.
>
> Unfortunately there is no clearly identified set of building blocks
> in base R. For example, if I want the comparison operations to work
> on my object, I need to implement ==, >, <, !=, <=, and >= (the
> 'Compare' group generics) even though it should be enough to
> implement
> == and >=, because all the others can be described in terms of these
> 2 building blocks. unique/duplicated is another example
> (unique(x) is
> conceptually x[!duplicated(x)]). And so on...
>
> Cheers,
> H.
>
>
> OTOH, simple R versions such as your 'is.true', called 'is1'
> inside Matrix maybe optimizable a bit by the byte compiler (and
> jit and other such tricks) and still keep the full
> semantic including correct method dispatch.
>
> Martin Maechler, ETH Zurich
>
>
> > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski <
> > dimitri.liakhovitski at gmail.com
> <mailto:dimitri.liakhovitski at gmail.com>__> wrote:
>
> >> Thank you very much, Duncan.
> >> All this being said:
> >>
> >> What would you say is the most elegant and most
> safe way to
> solve such
> >> a seemingly simple task?
> >>
> >> Thank you!
> >>
> >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
> >> <murdoch.duncan at gmail.com
> <mailto:murdoch.duncan at gmail.com>> wrote:
> >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
> >> >> So, Duncan, do I understand you correctly:
> >> >>
> >> >> When I use x$x<6, R doesn't know if it's TRUE or
> FALSE, so
> it returns
> >> >> a logical value of NA.
> >> >
> >> > Yes, when x$x is NA. (Though I think you meant x$c.)
> >> >
> >> >> When this logical value is applied to a row, the
> R says:
> hell, I don't
> >> >> know if I should keep it or not, so, just in
> case, I am
> going to keep
> >> >> it, but I'll replace all the values in this row
> with NAs?
> >> >
> >> > Yes. Indexing with a logical NA is probably a
> mistake, and
> this is one
> >> > way to signal it without actually triggering a
> warning or
> error.
> >> >
> >> > BTW, I should have mentioned that the example
> where you
> indexed using
> >> > -which(x$c>=6) is a bad idea: if none of the
> entries were 6
> or more,
> >> > this would be indexing with an empty vector, and
> you'd get
> nothing, not
> >> > everything.
> >> >
> >> > Duncan Murdoch
> >> >
> >> >
> >> >>
> >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch
> >> >> <murdoch.duncan at gmail.com
> <mailto:murdoch.duncan at gmail.com>> wrote:
> >> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote:
> >> >>>> I know how to get the output I need, but I
> would benefit
> from an
> >> >>>> explanation why R behaves the way it does.
> >> >>>>
> >> >>>> # I have a data frame x:
> >> >>>> x =
> data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10))
> >> >>>> x
> >> >>>> # I want to toss rows in x that contain values
> >=6. But I
> don't want
> >> >>>> to toss my NAs there.
> >> >>>>
> >> >>>> subset(x,c<6) # Works correctly, but removes
> NAs in c,
> understand why
> >> >>>> x[which(x$c<6),] # Works correctly, but
> removes NAs in c,
> understand
> >> why
> >> >>>> x[-which(x$c>=6),] # output I need
> >> >>>>
> >> >>>> # Here is my question: why does the following line
> replace the values
> >> >>>> of all rows that contain an NA # in x$c with NAs?
> >> >>>>
> >> >>>> x[x$c<6,] # Leaves rows with c=NA, but makes
> the whole
> row an NA.
> >> Why???
> >> >>>> x[(x$c<6) | is.na <http://is.na>(x$c),] #
> output I need - I have to be
> >> super-explicit
> >> >>>>
> >> >>>> Thank you very much!
> >> >>>
> >> >>> Most of your examples (except the ones using
> which()) are
> doing logical
> >> >>> indexing. In logical indexing, TRUE keeps a
> line, FALSE
> drops the
> >> line,
> >> >>> and NA returns NA. Since "x$c < 6" is NA if
> x$c is NA,
> you get the
> >> >>> third kind of indexing.
> >> >>>
> >> >>> Your last example works because in the cases
> where x$c is
> NA, it
> >> >>> evaluates NA | TRUE, and that evaluates to
> TRUE. In the
> cases where
> >> x$c
> >> >>> is not NA, you get x$c < 6 | FALSE, and that's
> the same as
> x$c < 6,
> >> >>> which will be either TRUE or FALSE.
> >> >>>
> >> >>> Duncan Murdoch
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Dimitri Liakhovitski
> >>
> >> ________________________________________________
> >> R-help at r-project.org <mailto:R-help at r-project.org>
> mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/__listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/__posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> >> and provide commented, minimal, self-contained,
> reproducible
> code.
> >>
>
> > [[alternative HTML version deleted]]
>
> > ________________________________________________
> > R-help at r-project.org <mailto:R-help at r-project.org>
> mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/__listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> > PLEASE do read the posting guide
> http://www.R-project.org/__posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained,
> reproducible code.
>
> ________________________________________________
> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing
> list
> https://stat.ethz.ch/mailman/__listinfo/r-devel
> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>
> ________________________________________________
> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/r-devel
> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>
>
> --
> Gabriel Becker, PhD
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech, Inc.
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel
mailing list