[R] Variable Class "numeric" instead recognized by dplyr as a 'factor'

peter dalgaard pdalgd at gmail.com
Sun Sep 27 22:29:01 CEST 2015


> On 27 Sep 2015, at 22:12 , Bert Gunter <bgunter.4567 at gmail.com> wrote:
> 
>> 
>> Due to missing data, R originally classified each X and Y variable as a ‘factor’, subsequently changed to ‘numeric’ via ‘as.numeric’ command.
> 
> No.
> a) missing data will not cause numeric data to become factor. There's
> something wrong in the data from the beginning (as Thierry said)

Well, if you forget to tell R what the input code for missing is (na.strings if you use read.table), then that is de facto what happens: The whole column gets interpreted as character and subsequently converted to a factor. The fix is to _remember_ to tell R what missing value codes are being used.

> 
> b) If f is numeric data that is a factor, as.numeric(f) is almost
> certainly **not** the corrrect way to change it to numeric.

Amen... as.numeric(as.character(f)) if you must, but the proper fix is usually the above.

-pd

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list