[R] Subscripting problem with is.na()
Ivan Calandra
ivan.calandra at univ-reims.fr
Thu Jun 23 17:13:39 CEST 2016
Thank you Bert for this clarification. It is indeed an important point.
Ivan
--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calandra at univ-reims.fr
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/
Le 23/06/2016 à 17:06, Bert Gunter a écrit :
> Sorry, Ivan, your statement is incorrect:
>
> "When you use a single bracket on a list with only one argument in
> between, then R extracts "elements", i.e. columns in the case of a
> data.frame. This explains your errors. "
>
> e.g.
>
>> ex <- data.frame(a = 1:3, b = letters[1:3])
>> a <- 1:3
>> identical(ex[1], a)
> [1] FALSE
>
>> class(ex[1])
> [1] "data.frame"
>> class(a)
> [1] "integer"
>
> Compare:
>
>> identical(ex[[1]], a)
> [1] TRUE
>
> Why? Single bracket extraction on a list results in a list; double
> bracket extraction results in the element of the list ( a "column" in
> the case of a data frame, which is a specific kind of list). The
> relevant sections of ?Extract are:
>
> "Indexing by [ is similar to atomic vectors and selects a **list** of
> the specified element(s).
>
> Both [[ and $ select a **single element of the list**. "
>
>
> Hope this clarifies this often-confused issue.
>
>
> Cheers,
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Jun 23, 2016 at 7:34 AM, Ivan Calandra
> <ivan.calandra at univ-reims.fr> wrote:
>> My statement "Using a single bracket '[' on a data.frame does the same as
>> for matrices: you need to specify rows and columns" was not correct.
>>
>>
>> When you use a single bracket on a list with only one argument in between,
>> then R extracts "elements", i.e. columns in the case of a data.frame. This
>> explains your errors.
>>
>> But it is possible to use a single bracket on a data.frame with 2 arguments
>> (rows, columns) separated by a comma, as with matrices. This is the solution
>> you received.
>>
>> Ivan
>>
>>
>> --
>> Ivan Calandra, PhD
>> Scientific Mediator
>> University of Reims Champagne-Ardenne
>> GEGENAA - EA 3795
>> CREA - 2 esplanade Roland Garros
>> 51100 Reims, France
>> +33(0)3 26 77 36 89
>> ivan.calandra at univ-reims.fr
>> --
>> https://www.researchgate.net/profile/Ivan_Calandra
>> https://publons.com/author/705639/
>>
>> Le 23/06/2016 à 16:27, Ivan Calandra a écrit :
>>> Dear Georg,
>>>
>>> You need to learn a bit more about the subsetting methods, depending on
>>> the object structure you're trying to subset.
>>>
>>> More specifically, when you run this: ds_test[is.na(ds_test$var1)]
>>> you get this error: "Error in `[.data.frame`(ds_test, is.na(ds_test$var1))
>>> : undefined columns selected"
>>>
>>> This means that R does not understand which column you're trying to
>>> select. But you're actually trying to select rows.
>>>
>>> Using a single bracket '[' on a data.frame does the same as for matrices:
>>> you need to specify rows and columns, like this:
>>> ds_test[is.na(ds_test$var1), ] ## notice the last comma
>>> ds_test[is.na(ds_test$var1), ] <- 0 ## works on all columns because you
>>> didn't specify any after the comma
>>>
>>> If you want it only for "var1", then you need to specify the column:
>>> ds_test[is.na(ds_test$var1), "var1"] <- 0
>>>
>>> It's the same problem with your 2nd and 4th tries (4th one has other
>>> problems). Your 3rd try does not change ds_test at all.
>>>
>>> HTH,
>>> Ivan
>>>
>>> --
>>> Ivan Calandra, PhD
>>> Scientific Mediator
>>> University of Reims Champagne-Ardenne
>>> GEGENAA - EA 3795
>>> CREA - 2 esplanade Roland Garros
>>> 51100 Reims, France
>>> +33(0)3 26 77 36 89
>>> ivan.calandra at univ-reims.fr
>>> --
>>> https://www.researchgate.net/profile/Ivan_Calandra
>>> https://publons.com/author/705639/
>>>
>>> Le 23/06/2016 à 15:57, G.Maubach at weinwolf.de a écrit :
>>>> Hi All,
>>>>
>>>> I would like to recode my NAs to 0. Using a single vector everything is
>>>> fine.
>>>>
>>>> But if I use a data.frame things go wrong:
>>>>
>>>> -- cut --
>>>>
>>>> var1 <- c(1:3, NA, 5:7, NA, 9:10)
>>>> var2 <- c(1:3, NA, 5:7, NA, 9:10)
>>>> ds_test <-
>>>> data.frame(var1, var2)
>>>>
>>>> test <- var1
>>>> test[is.na(test)] <- 0
>>>> test # NA recoded OK
>>>>
>>>> # First try
>>>> ds_test[is.na(ds_test$var1)] <- 0 # duplicate subscripts WRONG
>>>>
>>>> # Second try
>>>> ds_test[is.na("var1")] <- 0
>>>> ds_test$var1 # not recoded WRONG
>>>>
>>>> # Third try: to me the most intuitive approach
>>>> is.na(ds_test["var1"]) <- 0 # attempt to select less than one element in
>>>> integerOneIndex WRONG
>>>>
>>>> # Fourth try
>>>> ds_test[is.na(var1)] <- 0 # duplicate subscripts for columns WRONG
>>>>
>>>> -- cut --
>>>> How can I do it correctly?
>>>>
>>>> Where could I have found something about it?
>>>>
>>>> Kind regards
>>>>
>>>> Georg
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list