data frames with non-unique row.names

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Wed, 20 Jan 1999 17:39:34 +0100


In R and S, the general idea is that data.frames 
must have unique  row.names (aka dimnames(.)[[1]]).

Several observations / problems  (in R *and* S !).

	[Example code at the end]

1)
  Both in S and R,

	  data.frame(..)

  (and e.g., also  cbind(<data.frame>, ..)  which dispatches to data.frame())
  silently drops the whole row.names and replaces it by "1" "2" ...
  if the names would be non-unique.

 PROPOSITION 1:  I have the feeling I'd want to get a warning in that case.
		However, you may prove me wrong... 

2)
   Now, in S (but not in R),
   the  "row.names<-"  function
   gives an error if you try to assign non-unique row.names.

   This is as desired (and R should do the same).

   (== BUG REPORT for R )
   
3) However, I can still  (both in S-plus 3.4 & 5.0r2)
   do
	attr(dat, "row.names") <-  <nonunique character>
  
   and get a resulting data.frame  dat   with non-unique row.names.

 PROPOSITION 2:  I think I want to make sure that a(the same?) error message
		 as in "2)" is generated in this case.

	(this is relatively easily accomplished via R's 
	 SetAttrib() in src/main/attrib.c)

------------------------------------------------------------------
>>>> or am I completely wrong, and there should be a way you can
>>>> construct a data.frame with non-unique row.names ???
------------------------------------------------------------------


Here are the S/R examples: 

## 1)
    dat <- d0 <- matrix(1:12, 3,4)
    dimnames(dat) <- list(c("r","r","r.3"),paste.i("V",4))
    dat
    data.frame(dat)# silently drops the row.names --- S == R

## 2)
    ### Now duplicated row.names:
    dat2 <- data.frame(d0); dimnames(dat2)[2] <- list(paste.i("V",4))
    (d2 <- dat2)

    ## Here, S gives the proper error message "... duplicate names" :
    ## R 0.63.2 simply accepts it;
    row.names(dat2) <-c("s","s","s.3") ; dat2

## 3)

    ## can we trick it (in S)?
    (dat2 <- d2)
    attr(dat2, "row.names") <-c("r","r","r.3") ; dat2
    row.names(dat2)[duplicated(row.names(dat2))]
    ## yes, S-plus 3.4 / 5.0r2 are tricked!!

Comments / suggestions / oppinions are very welcome!
(if not the 2 mailing lists, I'll summarize to them )
---

Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum SOL G1;	Sonneggstr.33
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1086			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._