[R] as.numeric() generates NAs inside an apply call, but fine outside of it
Chris Beeley
chris.beeley at gmail.com
Mon Jan 9 15:11:29 CET 2012
Hello-
I have rather a messy SPSS file which I have imported to R, I've dput'd
some of the columns at the end of this message. I wish to get rid of all
the labels and have numeric values using as.numeric. The funny thing is
it works like this:
as.numeric(mydata[,2]) # generates correct numbers
however, if I pass the whole dataframe at once like this:
apply(mydata, 1:2, function(x) as.numeric(x))
This same column, column 2, generates NAs with a "in FUN(newX[, i], ...)
: NAs introduced by coercion" message.
Meanwhile column 3 works fine like this:
as.numeric(mydata[,3]) # generates correct numbers
And generates numeric results out of the apply function.
I think I basically know why, the str() command tells me that the
variables which work okay are "labelled" whereas the ones that don't are
"Factor". However, I can't figure out what's special about the apply
call that generates the NAs when as.numeric(mydata[,2]) doesn't and I'm
not sure what to do about it in future.
I realise I can just loop over the columns, but I would rather get to
the bottom of this if I can so I know for future.
Thanks in advance for any advice
Chris Beeley
Institute of Mental Health, UK
dput() gives-
structure(list(id = structure(1:79, label = structure("Participant", .Names = "id"), class = "labelled"),
item2.jan11 = structure(c(4L, 3L, 6L, 4L, 6L, 6L, 2L, 6L,
2L, 2L, 3L, 3L, 1L, 6L, 2L, 6L, 4L, 2L, 6L, 2L, 6L, 6L, 6L,
4L, 4L, 6L, 2L, 6L, 2L, 6L, 2L, 3L, 6L, 6L, 3L, 6L, 5L, 6L,
3L, 6L, 1L, 3L, 3L, 3L, 6L, 4L, 1L, 3L, 6L, 2L, 6L, 2L, 6L,
6L, 6L, 4L, 3L, 6L, 6L, 6L, 6L, 6L, 3L, 6L, 2L, 6L, 6L, 2L,
4L, 6L, 2L, 5L, 6L, 6L, 6L, 6L, 1L, 6L, 4L), .Label = c("Not at all",
"a little", "somewhat", "quite a lot", "very much", "missing data"
), class = c("labelled", "factor"), label = structure("The patients care for each other", .Names = "item2_jan11")),
item12.jan11 = structure(c(5L, 5L, 999L, 5L, 999L, 999L,
2L, 999L, 5L, 2L, 5L, 3L, 3L, 999L, 2L, 999L, 5L, 5L, 999L,
5L, 999L, 999L, 999L, 5L, 5L, 999L, 3L, 999L, 5L, 999L, 3L,
4L, 999L, 999L, 4L, 999L, 5L, 999L, 5L, 999L, 3L, 5L, 4L,
4L, 999L, 3L, 2L, 4L, 999L, 5L, 999L, 5L, 999L, 999L, 999L,
4L, 5L, 999L, 999L, 999L, 999L, 999L, 4L, 999L, 3L, 999L,
999L, 1L, 5L, 999L, 3L, 5L, 999L, 999L, 999L, 999L, 4L, 999L,
0L), value.labels = structure(c(999, 5, 4, 3, 2, 1), .Names = c("missing data",
"very much", "quite a lot", "somewhat", "a little", "Not at all"
)), label = structure("At times, members of staff are afraid of some of the patients", .Names = "item12_jan11"), class = "labelled")), .Names = c("id",
"item2.jan11", "item12.jan11"), class = "data.frame", row.names = c(NA,
-79L))
More information about the R-help
mailing list