[R] Surprise when indexing with a factor.
Gabor Grothendieck
ggrothendieck at myway.com
Sat May 8 16:12:21 CEST 2004
Note that if f is a factor with Date labels, e.g.
f <- factor(c("2000-02-02","2000-02-03"))
then as.Date has a factor method whose effect is such that (as of R 1.9.1):
as.Date(f)
*is* the same as as.Date(as.character(f)) . (Presumably this makes
it easier to use as.Date with read.table.)
pallier <pallier <at> lscp.ehess.fr> writes:
:
: >It may be educational to read ?factor before you use a factor for some
: >operation (such as subscripting), I guess. In part, it says:
: >
: >Value:
: >
: > 'factor' returns an object of class '"factor"' which has a set of
: > numeric codes the length of 'x' with a '"levels"' attribute of
: > mode 'character'. If 'ordered' is true (or 'ordered' is used) the
: > result has class 'c("ordered", "factor")'.
: >
: >In other words, a factor is a numeric vector with a "levels" attribute.
: >What do you expect to happen when you use a numeric vector as subscript?
: >
: >
:
: Hello,
:
: Ok, I understand the point: I should have read the documentation better...
:
: The 'warning' section of the help on 'factor' is even more enlightning:
:
: > The interpretation of a factor depends on both the codes and the
: > `"levels"' attribute. Be careful only to compare factors with the
: > same set of levels (in the same order). In particular,
: > `as.numeric' applied to a factor is meaningless, and may happen by
: > implicit coercion.
:
: Let me argue that when a factor is printed, you don't see the numeric
: codes, you just see the labels. From an ergonomic point of view, in many
: situations where labels are used, the numeric representation of a
: unordered factor is just an irrelevant 'internal' coding. (E.g. when
: factors are parsed automatically by read.table).
:
: [Named vectors and labels in factors are part of the reasons why I like
: R better than, say, Matlab: you don't have to remember tons of numeric
: codes.]
:
: Given a named vector 'm' and a factor 'f' whose levels match (e.g. when
: 'm' is the result of a 'tapply' command using the factor f as INDEX), my
: intuition is that m[f] means m[as.character(f)]
:
: Others persons with a more precise knowledge of R probably find it
: natural that a factor is numeric in *essence* (despite its *appearance*
: when printed).
:
: I am not proposing to change R to adapt it to my intuition.
: I just believed that the trap was dangerous enough to (1) dare display
: my ignorance and (2) suggest that a warning in the 'Introduction to R'
: would not a bad idea (maybe it is but I have not read carefully enough...)
:
: Cheers,
:
: Christophe Pallier
:
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://www.stat.math.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
:
:
More information about the R-help
mailing list