[R] Variable Class "numeric" instead recognized by dplyr as a 'factor'

Bert Gunter bgunter.4567 at gmail.com
Sun Sep 27 23:09:12 CEST 2015


Yes, but I think of numeric data with non-numeric values (e.g. "." for
missing) as character, not numeric.  Missing to me means either empty
or with the missing value code specified as you describe. Ergo my
comment. Your clarification is nevertheless appropriate.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sun, Sep 27, 2015 at 1:29 PM, peter dalgaard <pdalgd at gmail.com> wrote:
>
>> On 27 Sep 2015, at 22:12 , Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>
>>>
>>> Due to missing data, R originally classified each X and Y variable as a ‘factor’, subsequently changed to ‘numeric’ via ‘as.numeric’ command.
>>
>> No.
>> a) missing data will not cause numeric data to become factor. There's
>> something wrong in the data from the beginning (as Thierry said)
>
> Well, if you forget to tell R what the input code for missing is (na.strings if you use read.table), then that is de facto what happens: The whole column gets interpreted as character and subsequently converted to a factor. The fix is to _remember_ to tell R what missing value codes are being used.
>
>>
>> b) If f is numeric data that is a factor, as.numeric(f) is almost
>> certainly **not** the corrrect way to change it to numeric.
>
> Amen... as.numeric(as.character(f)) if you must, but the proper fix is usually the above.
>
> -pd
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
>
>
>
>
>
>
>



More information about the R-help mailing list