[R] as.numeric() generates NAs inside an apply call, but fine outside of it
Petr PIKAL
petr.pikal at precheza.cz
Mon Jan 9 15:41:52 CET 2012
Hi
> Hello-
>
> I have rather a messy SPSS file which I have imported to R, I've dput'd
> some of the columns at the end of this message. I wish to get rid of all
> the labels and have numeric values using as.numeric. The funny thing is
> it works like this:
>
> as.numeric(mydata[,2]) # generates correct numbers
>
> however, if I pass the whole dataframe at once like this:
>
> apply(mydata, 1:2, function(x) as.numeric(x))
>
> This same column, column 2, generates NAs with a "in FUN(newX[, i], ...)
> : NAs introduced by coercion" message.
>
> Meanwhile column 3 works fine like this:
>
> as.numeric(mydata[,3]) # generates correct numbers
>
> And generates numeric results out of the apply function.
>
> I think I basically know why, the str() command tells me that the
> variables which work okay are "labelled" whereas the ones that don't are
> "Factor". However, I can't figure out what's special about the apply
> call that generates the NAs when as.numeric(mydata[,2]) doesn't and I'm
> not sure what to do about it in future.
Details section of apply help page tells you that an input object is
coerced to matrix which can have only values of one type therefore it is
transformed probably to nonumeric values which can not be coerced to
numeric.
If X is not an array but an object of a class with a non-null dim value
(such as a data frame), apply attempts to coerce it to an array via
as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.
The column two was at first factor which is numeric vector with values -
therefore
as.numeric(mydata[,2])
works
Then it was changed to character inside apply and the other columns were
converted too. It is possible to change character values to numeric if
they are numeric, see
> as.numeric(letters)
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA
[26] NA
Warning message:
NAs introduced by coercion
> as.numeric(as.character(1:10))
[1] 1 2 3 4 5 6 7 8 9 10
If you really want to change factor values in data frame to underlaying
numeric code use
sapply(mydata, as.numeric)
Regards
Petr
>
> I realise I can just loop over the columns, but I would rather get to
> the bottom of this if I can so I know for future.
>
> Thanks in advance for any advice
>
> Chris Beeley
> Institute of Mental Health, UK
>
> dput() gives-
>
> structure(list(id = structure(1:79, label = structure("Participant",
> .Names = "id"), class = "labelled"),
> item2.jan11 = structure(c(4L, 3L, 6L, 4L, 6L, 6L, 2L, 6L,
> 2L, 2L, 3L, 3L, 1L, 6L, 2L, 6L, 4L, 2L, 6L, 2L, 6L, 6L, 6L,
> 4L, 4L, 6L, 2L, 6L, 2L, 6L, 2L, 3L, 6L, 6L, 3L, 6L, 5L, 6L,
> 3L, 6L, 1L, 3L, 3L, 3L, 6L, 4L, 1L, 3L, 6L, 2L, 6L, 2L, 6L,
> 6L, 6L, 4L, 3L, 6L, 6L, 6L, 6L, 6L, 3L, 6L, 2L, 6L, 6L, 2L,
> 4L, 6L, 2L, 5L, 6L, 6L, 6L, 6L, 1L, 6L, 4L), .Label = c("Not at
all",
> "a little", "somewhat", "quite a lot", "very much", "missing data"
> ), class = c("labelled", "factor"), label = structure("The patients
> care for each other", .Names = "item2_jan11")),
> item12.jan11 = structure(c(5L, 5L, 999L, 5L, 999L, 999L,
> 2L, 999L, 5L, 2L, 5L, 3L, 3L, 999L, 2L, 999L, 5L, 5L, 999L,
> 5L, 999L, 999L, 999L, 5L, 5L, 999L, 3L, 999L, 5L, 999L, 3L,
> 4L, 999L, 999L, 4L, 999L, 5L, 999L, 5L, 999L, 3L, 5L, 4L,
> 4L, 999L, 3L, 2L, 4L, 999L, 5L, 999L, 5L, 999L, 999L, 999L,
> 4L, 5L, 999L, 999L, 999L, 999L, 999L, 4L, 999L, 3L, 999L,
> 999L, 1L, 5L, 999L, 3L, 5L, 999L, 999L, 999L, 999L, 4L, 999L,
> 0L), value.labels = structure(c(999, 5, 4, 3, 2, 1), .Names =
c("missing data",
> "very much", "quite a lot", "somewhat", "a little", "Not at all"
> )), label = structure("At times, members of staff are afraid of
some
> of the patients", .Names = "item12_jan11"), class = "labelled")), .Names
= c("id",
> "item2.jan11", "item12.jan11"), class = "data.frame", row.names = c(NA,
> -79L))
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list