[R] Subscripting problem with is.na()

Bert Gunter bgunter.4567 at gmail.com
Thu Jun 23 17:06:13 CEST 2016


Sorry, Ivan, your statement is incorrect:

"When you use a single bracket on a list with only one argument in
between, then R extracts "elements", i.e. columns in the case of a
data.frame. This explains your errors. "

e.g.

> ex <- data.frame(a = 1:3, b = letters[1:3])
> a <- 1:3

> identical(ex[1], a)
[1] FALSE

> class(ex[1])
[1] "data.frame"
> class(a)
[1] "integer"

Compare:

> identical(ex[[1]], a)
[1] TRUE

Why? Single bracket extraction on a list results in a list; double
bracket extraction results in the element of the list ( a "column" in
the case of a data frame, which is a specific kind of list). The
relevant sections of ?Extract are:

"Indexing by [ is similar to atomic vectors and selects a **list** of
the specified element(s).

Both [[ and $ select a **single element of the list**. "


Hope this clarifies this often-confused issue.


Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Jun 23, 2016 at 7:34 AM, Ivan Calandra
<ivan.calandra at univ-reims.fr> wrote:
> My statement "Using a single bracket '[' on a data.frame does the same as
> for matrices: you need to specify rows and columns" was not correct.
>
>
> When you use a single bracket on a list with only one argument in between,
> then R extracts "elements", i.e. columns in the case of a data.frame. This
> explains your errors.
>
> But it is possible to use a single bracket on a data.frame with 2 arguments
> (rows, columns) separated by a comma, as with matrices. This is the solution
> you received.
>
> Ivan
>
>
> --
> Ivan Calandra, PhD
> Scientific Mediator
> University of Reims Champagne-Ardenne
> GEGENAA - EA 3795
> CREA - 2 esplanade Roland Garros
> 51100 Reims, France
> +33(0)3 26 77 36 89
> ivan.calandra at univ-reims.fr
> --
> https://www.researchgate.net/profile/Ivan_Calandra
> https://publons.com/author/705639/
>
> Le 23/06/2016 à 16:27, Ivan Calandra a écrit :
>>
>> Dear Georg,
>>
>> You need to learn a bit more about the subsetting methods, depending on
>> the object structure you're trying to subset.
>>
>> More specifically, when you run this: ds_test[is.na(ds_test$var1)]
>> you get this error: "Error in `[.data.frame`(ds_test, is.na(ds_test$var1))
>> : undefined columns selected"
>>
>> This means that R does not understand which column you're trying to
>> select. But you're actually trying to select rows.
>>
>> Using a single bracket '[' on a data.frame does the same as for matrices:
>> you need to specify rows and columns, like this:
>> ds_test[is.na(ds_test$var1), ] ## notice the last comma
>> ds_test[is.na(ds_test$var1), ] <- 0 ## works on all columns because you
>> didn't specify any after the comma
>>
>> If you want it only for "var1", then you need to specify the column:
>> ds_test[is.na(ds_test$var1), "var1"] <- 0
>>
>> It's the same problem with your 2nd and 4th tries (4th one has other
>> problems). Your 3rd try does not change ds_test at all.
>>
>> HTH,
>> Ivan
>>
>> --
>> Ivan Calandra, PhD
>> Scientific Mediator
>> University of Reims Champagne-Ardenne
>> GEGENAA - EA 3795
>> CREA - 2 esplanade Roland Garros
>> 51100 Reims, France
>> +33(0)3 26 77 36 89
>> ivan.calandra at univ-reims.fr
>> --
>> https://www.researchgate.net/profile/Ivan_Calandra
>> https://publons.com/author/705639/
>>
>> Le 23/06/2016 à 15:57, G.Maubach at weinwolf.de a écrit :
>>>
>>> Hi All,
>>>
>>> I would like to recode my NAs to 0. Using a single vector everything is
>>> fine.
>>>
>>> But if I use a data.frame things go wrong:
>>>
>>> -- cut --
>>>
>>> var1 <- c(1:3, NA, 5:7, NA, 9:10)
>>> var2 <- c(1:3, NA, 5:7, NA, 9:10)
>>> ds_test <-
>>>    data.frame(var1, var2)
>>>
>>> test <- var1
>>> test[is.na(test)] <- 0
>>> test  # NA recoded OK
>>>
>>> # First try
>>> ds_test[is.na(ds_test$var1)] <- 0  # duplicate subscripts WRONG
>>>
>>> # Second try
>>> ds_test[is.na("var1")] <- 0
>>> ds_test$var1  # not recoded WRONG
>>>
>>> # Third try: to me the most intuitive approach
>>> is.na(ds_test["var1"]) <- 0  # attempt to select less than one element in
>>> integerOneIndex WRONG
>>>
>>> # Fourth try
>>> ds_test[is.na(var1)] <- 0  # duplicate subscripts for columns WRONG
>>>
>>> -- cut --
>>>   How can I do it correctly?
>>>
>>> Where could I have found something about it?
>>>
>>> Kind regards
>>>
>>> Georg
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list