[R] real numeric variable transforms into factor:
Aldi Kraja
aldi at wustl.edu
Fri Apr 17 22:38:42 CEST 2009
Thank you Marc for your detailed and helpful info.
Aldi
Marc Schwartz wrote:
> On Apr 17, 2009, at 2:52 PM, Aldi Kraja wrote:
>
>> Hi
>> Test made in: R in windows Vista OS, R version 2.8.1
>> From FAQ:
>> http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f
>>
>> "It may happen that when reading numeric data into R (usually, when
>> reading in a file), they come in as factors. If |f| is such a factor
>> object, you can use as.numeric(as.character(f)) to get the numbers
>> back."
>>
>> 1: Why it may happen? Why R transforms x1 from real numeric with decimal
>> values into factor???
>> 2: Doesn't it look strange to get "internal numbers" when one applies
>> as.numeric(x$x1)?
>> 3. What are the internal numbers mentioned in the FAQ?
>> Why is needed to write:
>> as.numeric(as.character(x$x1)) to get finally the right numbers I read
>> with read.table?
>> Are the missing values shown as dot to force R (or the programmer who
>> wrote the function read.table) to consider x1 as factor?
>>
>> Is it possible who is maintaining the read.table function to improve it
>> to recognize numbers with decimal places as numeric and not as factors
>> and dots as missing values which transform into NA?
>>
>> The data file saved as text:
>> test.txt
>> ob,x1,y1
>> 1,1.1,1/1
>> 2,2.1,1/2
>> 3,3.2,2/2
>> 4,.,0/0
>> 5,4.5,1/1
>> 6,5.1,0/0
>> 7,6.3,1/1
>> 8,.,1/2
>> reading it from d directory:
>> x<-read.table(file="d:\\test\\test.txt",header=T,sep=',')
>>> x
>> ob x1 y1
>> 1 1 1.1 1/1
>> 2 2 2.1 1/2
>> 3 3 3.2 2/2
>> 4 4 . 0/0
>> 5 5 4.5 1/1
>> 6 6 5.1 0/0
>> 7 7 6.3 1/1
>> 8 8 . 1/2
>>> as.numeric(x$x1)
>> [1] 2 3 4 1 5 6 7 1
>>
>>> as.numeric(as.character(x$x1))
>> [1] 1.1 2.1 3.2 NA 4.5 5.1 6.3 NA
>> Warning message:
>> NAs introduced by coercion
>>
>> Thanks,
>>
>> Aldi
>
> Looks like you are taking data from SAS perhaps, where the missing
> value indicator is '.'. In R, where type.convert() is used to
> determine the data types for incoming text data, you get:
>
> > type.convert(".")
> [1] .
> Levels: .
>
> That is, a factor.
>
> What you want to do is to set the 'na.strings' argument to
> read.table() to '.' rather than the default 'NA', so that the periods
> are interpreted as missing values and set to NA during import. Thus:
>
> # Create from your data in the clipboard (on OSX)
> DF <- read.table(pipe("pbpaste"), header = TRUE, sep = ",", na.strings
> = ".")
>
> > DF
> ob x1 y1
> 1 1 1.1 1/1
> 2 2 2.1 1/2
> 3 3 3.2 2/2
> 4 4 NA 0/0
> 5 5 4.5 1/1
> 6 6 5.1 0/0
> 7 7 6.3 1/1
> 8 8 NA 1/2
>
> > str(DF)
> 'data.frame': 8 obs. of 3 variables:
> $ ob: int 1 2 3 4 5 6 7 8
> $ x1: num 1.1 2.1 3.2 NA 4.5 5.1 6.3 NA
> $ y1: Factor w/ 4 levels "0/0","1/1","1/2",..: 2 3 4 1 2 1 2 3
>
> This is now because:
>
> > type.convert(".", na.strings = ".")
> [1] NA
>
>
> See ?read.table and ?type.convert for more information.
>
> HTH,
>
> Marc Schwartz
--
More information about the R-help
mailing list