[R] How to convert a factor column into a numeric one?

Joshua Wiley jwiley.psych at gmail.com
Sun Jun 5 23:29:00 CEST 2011


Hmm, that is a bit tricky.  The conversion from a table to a data
frame uses the dimension names, which are always character.  To bypass
this, you would need to save the dimension names, convert the ones you
want numeric to numeric (I am assuming everything except Conc, so the
indices would be c(1, 2, 4)), and then manually convert from table to
data frame (but that is not too difficult).

In your case I am not sure there is a big benefit one way or the
other, but if you do it the way you have been and then convert the
data back to numeric, if you use:

df<- na.omit(as.data.frame(ttcrmean, stringsAsFactors = FALSE))

then what you tried here will work again:

df$Time<- as.numeric(df$Time)

plus be slightly more computationally efficient (although you are not
dealing with that much data so it is probably not a big deal).  Below
is an example of the manual conversion I mentioned.  It only takes
three lines of code, the data should be numeric, and your column is
named "Log10", so its basically equivalent to what you had, but the
logic behind the code is a little less straightforward, which could
hurt readability in the future.

###########################
ttcrmean<- as.table(by(ngbe[,'Log10'],
list(Time=ngbe$Time,Temp=ngbe$Temp,Conc=ngbe$Conc,Repl=ngbe$Replicate),
  mean))
for (k in 1:3) {  #fix-up time zeroes
  for (l in 1:5) { #replicates
    t0val<- ttcrmean[1,3,k,l]
    for (j in 1:4) {  #temps
      ttcrmean[1,j,k,l]<- t0val
    } #j
  } #l
} #i

## Convert dimnames of your table that you want
## to be numeric to numeric and skip over Conc
xn <- dimnames(ttcrmean)
xn[c(1, 2, 4)] <- lapply(xn[c(1, 2, 4)], as.numeric)

## convert the table to a data frame manually
df <- na.omit(data.frame(expand.grid(xn), Log10 = c(ttcrmean)))
######################

Cheers,

Josh

On Sat, Jun 4, 2011 at 10:22 PM, Robert A LaBudde <ral at lcfltd.com> wrote:
> Thanks for your help.
>
> As far as your question below is concerned, the data frame arose as a result
> of some data cleaning on an original data frame, which was changed into a
> table, modified, and changed back to a data frame:
>
> ttcrmean<- as.table(by(ngbe[,'Log10'],
> list(Time=ngbe$Time,Temp=ngbe$Temp,Conc=ngbe$Conc,Repl=ngbe$Replicate),
>  mean))
> for (k in 1:3) {  #fix-up time zeroes
>  for (l in 1:5) { #replicates
>    t0val<- ttcrmean[1,3,k,l]
>    for (j in 1:4) {  #temps
>      ttcrmean[1,j,k,l]<- t0val
>    } #j
>  } #l
> } #i
> df<- na.omit(as.data.frame(ttcrmean))
> colnames(df)[5]<- 'Log10'
>
>
> At 12:51 AM 6/5/2011, Joshua Wiley wrote:
>>
>> Hi Robert,
>> <snip>
>> I would also look into *why* those numeric columns are being stored as
>> factors in the first place.  If you are reading the data in with
>> read.table() or one of its wrapper functions (like read.csv), then it
>> would be better to preempt the storage as a factor altogether rather
>> than converting back to numeric.  For example, perhaps something is
>> being used to indicate missing data that R does not recognize (e.g.,
>> SAS uses ".").  Specifying na.strings = ".", would fix this.  See
>> ?read.table for some of the options available.
>> <snip>
>
>
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> 824 Timberlake Drive                     Tel: 757-467-0954
> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
>
> "Vere scire est per causas scire"
> ================================================================
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list