[R] [FORGED] Q re: logical indexing with is.na
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Sun Mar 10 12:46:00 CET 2019
On 10/03/2019 1:15 a.m., David Goldsmith wrote:
> Thanks, all. I had read about recycling, but I guess I didn't fully
> appreciate all the "weirdness" it might produce. :/
>
> With this explained, I'm going to ask a follow-up, which is only
> contextually related: the impetus for this discovery was checking "corner
> cases" to determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to
> determine equality of two vectors containing NA's. Between the above
> result; my related discovery that this indexing preserves relative
> positional info but not absolute positional info; and the performance
> penalty when comparing long vectors that may be unequal "early on"; I've
> concluded that--if it (can be made to) "short circuit"--it would probably
> be better to use an implicit loop. So that's my Q: will (or can) an
> implicit loop (be made to) "exit early" if a specified condition is met
> before all indices have been checked?
You could use the identical() function. When I have vectors of length 1
million, all(x == y) takes about 3 milliseconds when the difference is
in the last value, 2 milliseconds when it comes first. identical(x, y)
takes about 5 milliseconds when the difference comes last, but 0.006
milliseconds when it comes first. Of course, all(x == y) and
identical(x, y) do slightly different tests: read the docs!
Duncan Murdoch
>
> Thanks again!
>
> DLG
>
> On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
> wrote:
>
>> Regarding the mention of logical indexing, under ?Extract I see:
>>
>> For [-indexing only: i, j, ... can be logical vectors, indicating
>> elements/slices to select. Such vectors are recycled if necessary to match
>> the corresponding extent. i, j, ... can also be negative integers,
>> indicating elements/slices to leave out of the selection.
>>
>> On March 9, 2019 6:57:05 PM PST, Rolf Turner <r.turner using auckland.ac.nz>
>> wrote:
>>> On 3/10/19 2:36 PM, David Goldsmith wrote:
>>>> Hi! Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/ R";
>>> not
>>>> new to statistics (have had grad-level courses and work experience in
>>>> statistics) or vectorized programming syntax (have extensive
>>> experience
>>>> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time
>>> ago--of
>>>> experience w/ S-plus).
>>>>
>>>> In exploring the use of is.na in the context of logical indexing,
>>> I've come
>>>> across the following puzzling-to-me result:
>>>>
>>>>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
>>>> [1] 0.3534253 -1.6731597 NA -0.2079209
>>>> [1] TRUE TRUE FALSE
>>>> [1] 0.3534253 -1.6731597 -0.2079209
>>>>
>>>> As you can see, y is a four element vector, the third element of
>>> which is
>>>> NA; the next line gives what I would expect--T T F--because the first
>>> two
>>>> elements are not NA but the third element is. The third line is what
>>>> confuses me: why is the result not the two element vector consisting
>>> of
>>>> simply the first two elements of the vector (or, if vectorized
>>> indexing in
>>>> R is implemented to return a vector the same length as the logical
>>> index
>>>> vector, which appears to be the case, at least the first two elements
>>> and
>>>> then either NA or NaN in the third slot, where the logical indexing
>>> vector
>>>> is FALSE): why does the implementation "go looking" for an element
>>> whose
>>>> index in the "original" vector, 4, is larger than BOTH the largest
>>> index
>>>> specified in the inner-most subsetting index AND the size of the
>>> resulting
>>>> indexing vector? (Note: at first I didn't even understand why the
>>> result
>>>> wasn't simply
>>>>
>>>> 0.3534253 -1.6731597 NA
>>>>
>>>> but then I realized that the third logical index being FALSE, there
>>> was no
>>>> reason for *any* element to be there; but if there is, due to some
>>>> overriding rule regarding the length of the result relative to the
>>> length
>>>> of the indexer, shouldn't it revert back to *something* that
>>> indicates the
>>>> "FALSE"ness of that indexing element?)
>>>>
>>>> Thanks!
>>>
>>> It happens because R is eco-concious and re-cycles. :-)
>>>
>>> Try:
>>>
>>> ok <- c(TRUE,TRUE,FALSE)
>>> (1:4)[ok]
>>>
>>> In general in R if there is an operation involving two vectors then
>>> the shorter one gets recycled to provide sufficiently many entries to
>>> match those of the longer vector.
>>>
>>> This in the foregoing example the first entry of "ok" gets used again,
>>> to make a length 4 vector to match up with 1:4. The result is the same
>>>
>>> as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].
>>>
>>> If you did (1:7)[ok] you'd get the same result as that from
>>> (1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
>>> recycled 2 and 1/3 times.
>>>
>>> Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .
>>>
>>> Note that in the first two instances you get warnings, but in the third
>>> you don't, since 6 is an integer multiple of 3.
>>>
>>> Why aren't there warnings when logical indexing is used? I guess
>>> because it would be annoying. Maybe.
>>>
>>> Note that integer indices get recycled too, but the recycling is
>>> limited
>>> so as not to produce redundancies. So
>>>
>>> (1:4)[1:3] just (sensibly) gives
>>>
>>> [1] 1 2 3
>>>
>>> and *not*
>>>
>>> [1] 1 2 3 1
>>>
>>> Perhaps a bit subtle, but it gives what you'd actually *want* rather
>>> than being pedantic about rules with a result that you wouldn't want.
>>>
>>> cheers,
>>>
>>> Rolf Turner
>>>
>>> P.S. If you do
>>>
>>> y[1:3][!is.na(y[1:3])]
>>>
>>> i.e. if you're careful to match the length of the vector and the that
>>> of
>>> the indices, you get what you initially expected.
>>>
>>> R. T.
>>>
>>> P^2.S. To the younger and wiser heads on this list: the help on "["
>>> does not mention that the index vectors can be logical. I couldn't
>>> find
>>> anything about logical indexing in the R help files. Is something
>>> missing here, or am I just not looking in the right place?
>>>
>>> R. T.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list