[R] Variable Class "numeric" instead recognized by dplyr as a 'factor'

peter dalgaard pdalgd at gmail.com
Sun Sep 27 22:29:01 CEST 2015

> On 27 Sep 2015, at 22:12 , Bert Gunter <bgunter.4567 at gmail.com> wrote:
>> Due to missing data, R originally classified each X and Y variable as a ‘factor’, subsequently changed to ‘numeric’ via ‘as.numeric’ command.
> No.
> a) missing data will not cause numeric data to become factor. There's
> something wrong in the data from the beginning (as Thierry said)

Well, if you forget to tell R what the input code for missing is (na.strings if you use read.table), then that is de facto what happens: The whole column gets interpreted as character and subsequently converted to a factor. The fix is to _remember_ to tell R what missing value codes are being used.

> b) If f is numeric data that is a factor, as.numeric(f) is almost
> certainly **not** the corrrect way to change it to numeric.

Amen... as.numeric(as.character(f)) if you must, but the proper fix is usually the above.


Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-help mailing list