[R] Convert factor to numeric vector of labels
Bert Gunter
gunter.berton at gene.com
Tue Aug 14 22:26:18 CEST 2007
Matt:
I believe you have confused issues.
Setting stringsAsFactors = FALSE would dramatically **increase** the amount
of memory used for storing character vectors, which is what factors are for.
So your proposed solution does exactly the opposite of what you want.
The issue you are worried about is when numeric fields are somehow
interpreted as non-numeric. This can happen for a variety of reasons (stray
characters in numeric fields,quotes around numbers,...). The solution is not
to set a global default that does the opposite of what you want in its
intended use, but to read the documentation and either set the appropriate
arguments (perhaps colClasses of read.table) or fix the original data before
R reads it (e.g. remove quotes and stray characters). Failing that, the
"one-off" solutions given are the correct way to handle what is a data
problem, not an R problem.
However, I should add that there are arguments for making stringsAsFactors =
FALSE; search the archives for discussions why. The memory penalty will have
to be paid, of course.
Bert Gunter
Genentech Nonclinical Statistics
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Matthew Keller
Sent: Tuesday, August 14, 2007 12:48 PM
To: John Kane
Cc: Falk Lieder; r-help at stat.math.ethz.ch
Subject: Re: [R] Convert factor to numeric vector of labels
Hi all,
If we, the R community, are endeavoring to make R user friendly
(gasp!), I think that one of the first places to start would be in
setting stringsAsFactors = FALSE. Several times I've run into
instances of folks decrying R's "rediculous usage of memory" in
reading data, only to come to find out that these folks were
unknowingly importing certain columns as factors. The fix is easy once
you know it, but it isn't obvious to new users, and I'd bet that it
turns some % of people off of the program. Factors are not used often
enough to justify this default behavior in my opinion. When factors
are used, the user knows to treat the variable as a factor, and so it
can be done on a case-by-case (or should I say variable-by-variable?)
basis.
Is this a default that should be changed?
Matt
On 8/13/07, John Kane <jrkrideau at yahoo.ca> wrote:
> This is one of R's rather _endearing_ little
> idiosyncrasies. I ran into it a while ago.
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html
>
>
> For some reason, possibly historical, the option
> "stringAsFactors" is set to TRUE.
>
> As Prof Ripley says FAQ 7.10 will tell you
> as.numeric(as.character(f)) # for a one-off conversion
>
> >From Gabor Grothendieck A one-off solution for a
> complete data.frame
>
> DF <- data.frame(let = letters[1:3], num = 1:3,
> stringsAsFactors = FALSE)
>
> str(DF) # to see what has happened.
>
> You can reset the option globally, see below. However
> you might want to read Gabor Grothendieck's comment
> about this in the thread referenced above since it
> could cause problems if you transfer files alot.
>
> Personally I went with the global option since I don't
> tend to transfer programs to other people and I was
> getting tired of tracking down errors in my programs
> caused by numeric and character variables suddenly
> deciding to become factors.
>
> >From Steven Tucker:
>
> You can also this option globally with
> options(stringsAsFactors = TRUE) # in
> \library\base\R\Rprofile
>
> --- Falk Lieder <falk.lieder at googlemail.com> wrote:
>
> > Hi,
> >
> > I have imported a data file to R. Unfortunately R
> > has interpreted some
> > numeric variables as factors. Therefore I want to
> > reconvert these to numeric
> > vectors whose values are the factor levels' labels.
> > I tried
> > as.numeric(<factor>),
> > but it returns a vector of factor levels (i.e.
> > 1,2,3,...) instead of labels
> > (i.e. 0.71, 1.34, 2.61,.).
> > What can I do instead?
> >
> > Best wishes, Falk
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Matthew C Keller
Postdoctoral Fellow
Virginia Institute for Psychiatric and Behavioral Genetics
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list