[Rd] rbind.data.frame reacts on levels without factor (PR#9578)
prechelt at inf.fu-berlin.de
prechelt at inf.fu-berlin.de
Wed Mar 21 16:18:22 CET 2007
Full_Name: Lutz Prechelt
Version: 2.4.1
OS: Windows XP
Submission from: (NULL) (160.45.111.67)
I stack a number of data.frames using rbind.
Each of these dataframes has a column 'authorname', which is a factor
and a column author = unclass(authorname) as piecewise pseudonyms.
When using rbind to stack these dataframes, R warns about invalid factor levels
and inserts all NAs in the author column.
The reason appears to be that rbind.data.frame looks for the presence of levels,
not actually for class==factor when deciding what to handle as a factor:
if (!is.null(levels(xj))) {
I find this behavior surprising, hence dangerous, and it is not documented.
Rather, the documentation says:
"The 'rbind' data frame method takes the classes of the columns
from the first data frame, and matches columns by name (rather
than by position). Factors have their levels expanded as
necessary [...]"
The behavior has bitten me fairly hard, because I searched for the origin of the
warning in all the wrong places before finding the real one after about 3
hours.
(Although I still have not understood _why_ it results in that warning.)
I believe the behavior of rbind.data.frame should be fixed, so that it ignores
levels attributes when there is no factor class as well.
The alternative would be to just add a warning to the documentation that
'unclass' on factors is insufficient if users want to avoid factor handling for
rbind.
More information about the R-devel
mailing list