[R] as.numeric() generates NAs inside an apply call, but fine outside of it

Mon Jan 9 15:11:29 CET 2012

Hello-

I have rather a messy SPSS file which I have imported to R, I've dput'd 
some of the columns at the end of this message. I wish to get rid of all 
the labels and have numeric values using as.numeric. The funny thing is 
it works like this:

as.numeric(mydata[,2]) # generates correct numbers

however, if I pass the whole dataframe at once like this:

apply(mydata, 1:2, function(x) as.numeric(x))

This same column, column 2, generates NAs with a "in FUN(newX[, i], ...) 
: NAs introduced by coercion" message.

Meanwhile column 3 works fine like this:

as.numeric(mydata[,3]) # generates correct numbers

And generates numeric results out of the apply function.

I think I basically know why, the str() command tells me that the 
variables which work okay are "labelled" whereas the ones that don't are 
"Factor". However, I can't figure out what's special about the apply 
call that generates the NAs when as.numeric(mydata[,2]) doesn't and I'm 
not sure what to do about it in future.

I realise I can just loop over the columns, but I would rather get to 
the bottom of this if I can so I know for future.

Thanks in advance for any advice

Chris Beeley
Institute of Mental Health, UK

dput() gives-

structure(list(id = structure(1:79, label = structure("Participant", .Names = "id"), class = "labelled"),
     item2.jan11 = structure(c(4L, 3L, 6L, 4L, 6L, 6L, 2L, 6L,
     2L, 2L, 3L, 3L, 1L, 6L, 2L, 6L, 4L, 2L, 6L, 2L, 6L, 6L, 6L,
     4L, 4L, 6L, 2L, 6L, 2L, 6L, 2L, 3L, 6L, 6L, 3L, 6L, 5L, 6L,
     3L, 6L, 1L, 3L, 3L, 3L, 6L, 4L, 1L, 3L, 6L, 2L, 6L, 2L, 6L,
     6L, 6L, 4L, 3L, 6L, 6L, 6L, 6L, 6L, 3L, 6L, 2L, 6L, 6L, 2L,
     4L, 6L, 2L, 5L, 6L, 6L, 6L, 6L, 1L, 6L, 4L), .Label = c("Not at all",
     "a little", "somewhat", "quite a lot", "very much", "missing data"
     ), class = c("labelled", "factor"), label = structure("The patients care for each other", .Names = "item2_jan11")),
     item12.jan11 = structure(c(5L, 5L, 999L, 5L, 999L, 999L,
     2L, 999L, 5L, 2L, 5L, 3L, 3L, 999L, 2L, 999L, 5L, 5L, 999L,
     5L, 999L, 999L, 999L, 5L, 5L, 999L, 3L, 999L, 5L, 999L, 3L,
     4L, 999L, 999L, 4L, 999L, 5L, 999L, 5L, 999L, 3L, 5L, 4L,
     4L, 999L, 3L, 2L, 4L, 999L, 5L, 999L, 5L, 999L, 999L, 999L,
     4L, 5L, 999L, 999L, 999L, 999L, 999L, 4L, 999L, 3L, 999L,
     999L, 1L, 5L, 999L, 3L, 5L, 999L, 999L, 999L, 999L, 4L, 999L,
     0L), value.labels = structure(c(999, 5, 4, 3, 2, 1), .Names = c("missing data",
     "very much", "quite a lot", "somewhat", "a little", "Not at all"
     )), label = structure("At times, members of staff are afraid of some of the patients", .Names = "item12_jan11"), class = "labelled")), .Names = c("id",
"item2.jan11", "item12.jan11"), class = "data.frame", row.names = c(NA,
-79L))