data frames with non-unique row.names
Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch>
Wed, 20 Jan 1999 17:39:34 +0100
In R and S, the general idea is that data.frames
must have unique row.names (aka dimnames(.)[[1]]).
Several observations / problems (in R *and* S !).
[Example code at the end]
1)
Both in S and R,
data.frame(..)
(and e.g., also cbind(<data.frame>, ..) which dispatches to data.frame())
silently drops the whole row.names and replaces it by "1" "2" ...
if the names would be non-unique.
PROPOSITION 1: I have the feeling I'd want to get a warning in that case.
However, you may prove me wrong...
2)
Now, in S (but not in R),
the "row.names<-" function
gives an error if you try to assign non-unique row.names.
This is as desired (and R should do the same).
(== BUG REPORT for R )
3) However, I can still (both in S-plus 3.4 & 5.0r2)
do
attr(dat, "row.names") <- <nonunique character>
and get a resulting data.frame dat with non-unique row.names.
PROPOSITION 2: I think I want to make sure that a(the same?) error message
as in "2)" is generated in this case.
(this is relatively easily accomplished via R's
SetAttrib() in src/main/attrib.c)
------------------------------------------------------------------
>>>> or am I completely wrong, and there should be a way you can
>>>> construct a data.frame with non-unique row.names ???
------------------------------------------------------------------
Here are the S/R examples:
## 1)
dat <- d0 <- matrix(1:12, 3,4)
dimnames(dat) <- list(c("r","r","r.3"),paste.i("V",4))
dat
data.frame(dat)# silently drops the row.names --- S == R
## 2)
### Now duplicated row.names:
dat2 <- data.frame(d0); dimnames(dat2)[2] <- list(paste.i("V",4))
(d2 <- dat2)
## Here, S gives the proper error message "... duplicate names" :
## R 0.63.2 simply accepts it;
row.names(dat2) <-c("s","s","s.3") ; dat2
## 3)
## can we trick it (in S)?
(dat2 <- d2)
attr(dat2, "row.names") <-c("r","r","r.3") ; dat2
row.names(dat2)[duplicated(row.names(dat2))]
## yes, S-plus 3.4 / 5.0r2 are tricked!!
Comments / suggestions / oppinions are very welcome!
(if not the 2 mailing lists, I'll summarize to them )
---
Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum SOL G1; Sonneggstr.33
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1086 <><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._