[R] Convert factor to numeric vector of labels
Marc Schwartz
marc_schwartz at comcast.net
Tue Aug 14 22:50:26 CEST 2007
I think that you grossly underestimate the frequency of use of factors
in R, not to mention that factors are stored more efficiently than
character vectors.
All modeling functions depend upon them. Most testing, grouping and
plotting functions (base R and Lattice) either use them directly as
arguments or coerce character vectors to factors internally.
So, no...I would not advocate modifying such fundamental behavior.
UseRs should read the documentation before "jumping in with both feet"
so that they understand the underlying design philosophy behind R and
the actual documented functional behaviors. This would be superior to
moving forward with functional expectations that are predicated on false
assumptions and importantly, save you time.
In Falk's case, it seems reasonable, without having seen any actual
data, that the presumptive numeric column that was converted to a
factor, had non-numeric characters in it.
Thus, that a numeric column was coerced to a factor on import should
have raised a red flag pointing to a data quality problem.
Had the default behavior been otherwise, it is likely that Falk would
have proceeded with subsequent analyses without being aware of this
issue, perhaps resulting in a bad outcome.
The function that handles this in the read.table() family of functions
is called type.convert(). An example may be helpful:
Vec <- as.character(1:10)
> Vec
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
# Default behavior converts the character vector
# to numeric
> str(type.convert(Vec))
int [1:10] 1 2 3 4 5 6 7 8 9 10
# Now add in a non-numeric character (ie. bad data)
Vec1 <- c(Vec, "a")
> Vec1
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "a"
> str(type.convert(Vec1))
Factor w/ 11 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2 ...
Voilà
HTH,
Marc Schwartz
On Tue, 2007-08-14 at 13:47 -0600, Matthew Keller wrote:
> Hi all,
>
> If we, the R community, are endeavoring to make R user friendly
> (gasp!), I think that one of the first places to start would be in
> setting stringsAsFactors = FALSE. Several times I've run into
> instances of folks decrying R's "rediculous usage of memory" in
> reading data, only to come to find out that these folks were
> unknowingly importing certain columns as factors. The fix is easy once
> you know it, but it isn't obvious to new users, and I'd bet that it
> turns some % of people off of the program. Factors are not used often
> enough to justify this default behavior in my opinion. When factors
> are used, the user knows to treat the variable as a factor, and so it
> can be done on a case-by-case (or should I say variable-by-variable?)
> basis.
>
> Is this a default that should be changed?
>
> Matt
>
>
> On 8/13/07, John Kane <jrkrideau at yahoo.ca> wrote:
> > This is one of R's rather _endearing_ little
> > idiosyncrasies. I ran into it a while ago.
> > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html
> >
> >
> > For some reason, possibly historical, the option
> > "stringAsFactors" is set to TRUE.
> >
> > As Prof Ripley says FAQ 7.10 will tell you
> > as.numeric(as.character(f)) # for a one-off conversion
> >
> > >From Gabor Grothendieck A one-off solution for a
> > complete data.frame
> >
> > DF <- data.frame(let = letters[1:3], num = 1:3,
> > stringsAsFactors = FALSE)
> >
> > str(DF) # to see what has happened.
> >
> > You can reset the option globally, see below. However
> > you might want to read Gabor Grothendieck's comment
> > about this in the thread referenced above since it
> > could cause problems if you transfer files alot.
> >
> > Personally I went with the global option since I don't
> > tend to transfer programs to other people and I was
> > getting tired of tracking down errors in my programs
> > caused by numeric and character variables suddenly
> > deciding to become factors.
> >
> > >From Steven Tucker:
> >
> > You can also this option globally with
> > options(stringsAsFactors = TRUE) # in
> > \library\base\R\Rprofile
> >
> > --- Falk Lieder <falk.lieder at googlemail.com> wrote:
> >
> > > Hi,
> > >
> > > I have imported a data file to R. Unfortunately R
> > > has interpreted some
> > > numeric variables as factors. Therefore I want to
> > > reconvert these to numeric
> > > vectors whose values are the factor levels' labels.
> > > I tried
> > > as.numeric(<factor>),
> > > but it returns a vector of factor levels (i.e.
> > > 1,2,3,...) instead of labels
> > > (i.e. 0.71, 1.34, 2.61,…).
> > > What can I do instead?
> > >
> > > Best wishes, Falk
> >
More information about the R-help
mailing list