[R] How to convert a factor column into a numeric one?

Joshua Wiley jwiley.psych at gmail.com
Sun Jun 5 06:51:22 CEST 2011


Hi Robert,

Try this:

## Example data converting mtcars to factors
testdf <- as.data.frame(lapply(mtcars, factor))
str(testdf)

## taking advantage of assignment methods to avoid an explicit call to
as.data.frame
## convert factor to numeric using the technique recommended in ?factor
testdf[] <- lapply(testdf, function(x)
  as.numeric(levels(x))[x])
str(testdf)


If you do not want to convert all columns, just use a subset.  Here is one way:

testdf[, c("mpg", "cyl", "disp")] <-
  lapply(testdf[, c("mpg", "cyl", "disp")],
  function(x) as.numeric(levels(x))[x])

I would also look into *why* those numeric columns are being stored as
factors in the first place.  If you are reading the data in with
read.table() or one of its wrapper functions (like read.csv), then it
would be better to preempt the storage as a factor altogether rather
than converting back to numeric.  For example, perhaps something is
being used to indicate missing data that R does not recognize (e.g.,
SAS uses ".").  Specifying na.strings = ".", would fix this.  See
?read.table for some of the options available.

Hope this helps,

Josh

On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde <ral at lcfltd.com> wrote:
> I have a data frame:
>
>> head(df)
>  Time Temp Conc Repl    Log10
> 1    0  -20    H    1 6.406547
> 2    2  -20    H    1 5.738683
> 3    7  -20    H    1 5.796394
> 4   14  -20    H    1 4.413691
> 5    0    4    H    1 6.406547
> 7    7    4    H    1 5.705433
>> str(df)
> 'data.frame':   177 obs. of  5 variables:
>  $ Time : Factor w/ 4 levels "0","2","7","14": 1 2 3 4 1 3 4 1 3 4 ...
>  $ Temp : Factor w/ 4 levels "-20","4","25",..: 1 1 1 1 2 2 2 3 3 3 ...
>  $ Conc : Factor w/ 3 levels "H","L","M": 1 1 1 1 1 1 1 1 1 1 ...
>  $ Repl : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
>  $ Log10: num  6.41 5.74 5.8 4.41 6.41 ...
>> levels(df$Temp)
> [1] "-20" "4"   "25"  "45"
>> levels(df$Time)
> [1] "0"  "2"  "7"  "14"
>
> As you can see, "Time" and "Temp" are currently factors, not numeric.
>
> I would like to change these columns into numerical ones.
>
> df$Time<- as.numeric(df$Time)
>
> doesn't work, as it changes to the factor level indices (1,2,3,4) instead of
> the values (0,2,7,14).
>
> There must be a direct way of doing this in R.
>
> I tried recode() in 'car':
>
>> df$Temp<- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE)
>> head(df)
>  Time Temp Conc Repl     Freq
> 1    0  -20    H    1 6.406547
> 2    2  -20    H    1 5.738683
> 3    7  -20    H    1 5.796394
> 4   14  -20    H    1 4.413691
> 5    0   45    H    1 6.406547
> 7    7   45    H    1 5.705433
>
> but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as
> expected, although the result is numeric. The same happens if I use the
> order given by levels(df$Temp) instead of the sort order in the recode() 2nd
> argument.
>
> Any hints?
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> 824 Timberlake Drive                     Tel: 757-467-0954
> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
>
> "Vere scire est per causas scire"
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list