[Rd] Non-unique column names in data frames
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Apr 3 09:21:31 CEST 2007
On Sun, 1 Apr 2007, John Fox wrote:
> Dear r-devel members,
>
> It's just been brought to my attention that R permits non-unique column
> names in data frames -- e.g., via assignment to names() or colnames(). This
> behaviour is consistent with the help files (as I discovered), but it's not
> consistent with the behaviour of rownames() and row.names(). For example,
?? matrices and data frames are different, but rownames() and row.names()
do the same on each class.
>
> row.names(airquality) <- rep("a", nrow(airquality))
>
> generates an error, but
as does rownames().
>
> names(airquality) <- rep("a", ncol(airquality))
>
> or even
>
> names(airquality) <- rep("", ncol(airquality))
>
> do not.
>
> I figure that there must be some rationale for this difference, but I can't
> think of what it might be. Any thoughts?
It's part of the definition of a data frame, from long ago (White Book
p.60). Think of the row names as a 'primary key' in the sense of a
DBMS/SQL.
Why the names are not also required to be non-empty and unique
is something for the designer (and John Chambers has not (yet) replied),
but it is clearly deliberate as data.frame(check.names=FALSE) is allowed.
One possible issue is that there are many ways to set names of a data
frame, e.g. DF$name <- value can add a column, and checking them all could
be tedious. OTOH, setting row names is centralized (it is done inside
attr<-()).
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list