[R] Problem with factor state when subset()ing a data.frame
Roger Leigh
rleigh at whinlatter.ukfsn.org
Thu Feb 8 22:51:39 CET 2007
Hi folks,
I am running into a problem when calling subset() on a large
data.frame. One of the columns contains strings which are used as
factors. R seems to automatically factor the column when the
data.frame is contstructed, and this appears to not get updated when I
create a subset of the table.
A minimal testcase to demonstrate the problem follows:
sample <- data.frame(c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C"),
c(5,3,5,3,6,7,8,3,2,6))
names(sample) <- c("ID", "Value")
print(sample)
sample.filtered <- subset(sample, ID != "B", select=c(ID, Value))
# Or sample.filtered <- subset(sample, ID != "B", select=c(ID, Value), drop=T)
print(sample.filtered)
plot(sample.filtered)
plot(sample.filtered$Value ~ sample.filtered$ID)
print(levels(sample.filtered$ID))
print(levels(factor(sample.filtered$ID)))
plot(sample.filtered$Value ~ factor(sample.filtered$ID))
Am I doing something wrong here, or is this an R bug? How can I get
the new data.frame to update the factors, so I don't get redundant
"empty" factors on the plot by eliminating the "phantom" factors? (I
also need to remove the unused factors for other analyses, and
factoring them "by hand" seems a little redundant.)
Kind regards,
Roger
--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20070208/3566d3e5/attachment.bin
More information about the R-help
mailing list