[R] real numeric variable transforms into factor:

Aldi Kraja aldi at wustl.edu
Fri Apr 17 22:38:42 CEST 2009


Thank you Marc for your detailed and helpful info.

Aldi



Marc Schwartz wrote:
> On Apr 17, 2009, at 2:52 PM, Aldi Kraja wrote:
>
>> Hi
>> Test made in: R in windows Vista OS, R version 2.8.1
>> From FAQ:
>> http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f 
>>
>> "It may happen that when reading numeric data into R (usually, when
>> reading in a file), they come in as factors. If |f| is such a factor
>> object, you can use as.numeric(as.character(f)) to get the numbers 
>> back."
>>
>> 1: Why it may happen? Why R transforms x1 from real numeric with decimal
>> values into factor???
>> 2: Doesn't it look strange to get "internal numbers" when one applies
>> as.numeric(x$x1)?
>> 3. What are the internal numbers mentioned in the FAQ?
>> Why is needed to write:
>> as.numeric(as.character(x$x1)) to get finally the right numbers I read
>> with read.table?
>> Are the missing values shown as dot to force R (or the programmer who
>> wrote the function read.table) to consider x1 as factor?
>>
>> Is it possible who is maintaining the read.table function to improve it
>> to recognize numbers with decimal places as numeric and not as factors
>> and dots as missing values which transform into NA?
>>
>> The data file saved as text:
>> test.txt
>> ob,x1,y1
>> 1,1.1,1/1
>> 2,2.1,1/2
>> 3,3.2,2/2
>> 4,.,0/0
>> 5,4.5,1/1
>> 6,5.1,0/0
>> 7,6.3,1/1
>> 8,.,1/2
>> reading it from d directory:
>> x<-read.table(file="d:\\test\\test.txt",header=T,sep=',')
>>> x
>>  ob  x1  y1
>> 1  1 1.1 1/1
>> 2  2 2.1 1/2
>> 3  3 3.2 2/2
>> 4  4   . 0/0
>> 5  5 4.5 1/1
>> 6  6 5.1 0/0
>> 7  7 6.3 1/1
>> 8  8   . 1/2
>>> as.numeric(x$x1)
>> [1] 2 3 4 1 5 6 7 1
>>
>>> as.numeric(as.character(x$x1))
>> [1] 1.1 2.1 3.2  NA 4.5 5.1 6.3  NA
>> Warning message:
>> NAs introduced by coercion
>>
>> Thanks,
>>
>> Aldi
>
> Looks like you are taking data from SAS perhaps, where the missing 
> value indicator is '.'.  In R, where type.convert() is used to 
> determine the data types for incoming text data, you get:
>
> > type.convert(".")
> [1] .
> Levels: .
>
> That is, a factor.
>
> What you want to do is to set the 'na.strings' argument to 
> read.table() to '.' rather than the default 'NA', so that the periods 
> are interpreted as missing values and set to NA during import. Thus:
>
> # Create from your data in the clipboard (on OSX)
> DF <- read.table(pipe("pbpaste"), header = TRUE, sep = ",", na.strings 
> = ".")
>
> > DF
>   ob  x1  y1
> 1  1 1.1 1/1
> 2  2 2.1 1/2
> 3  3 3.2 2/2
> 4  4  NA 0/0
> 5  5 4.5 1/1
> 6  6 5.1 0/0
> 7  7 6.3 1/1
> 8  8  NA 1/2
>
> > str(DF)
> 'data.frame':    8 obs. of  3 variables:
>  $ ob: int  1 2 3 4 5 6 7 8
>  $ x1: num  1.1 2.1 3.2 NA 4.5 5.1 6.3 NA
>  $ y1: Factor w/ 4 levels "0/0","1/1","1/2",..: 2 3 4 1 2 1 2 3
>
> This is now because:
>
> > type.convert(".", na.strings = ".")
> [1] NA
>
>
> See ?read.table and ?type.convert for more information.
>
> HTH,
>
> Marc Schwartz

--




More information about the R-help mailing list