data frames with non-unique row.names (PR#98)

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Wed, 20 Jan 1999 18:06:33 +0100


(following up on myself:)

>>>>> "MM" == Martin Maechler <maechler@stat.math.ethz.ch> writes:

    MM> In R and S, the general idea is that data.frames 
    MM> must have unique  row.names (aka dimnames(.)[[1]]).

    MM> Several observations / problems  (in R *and* S !).

	....

    MM> 2)
    MM> Now, in S (but not in R),
    MM> the  "row.names<-"  function
    MM> gives an error if you try to assign non-unique row.names.

    MM> This is as desired (and R should do the same).

    MM> (== BUG REPORT for R )

this is easily fixed in  data.frame.R  if one doesn't want to do the fix
for "3)" as well.
   
    MM> 3) However, I can still  (both in S-plus 3.4 & 5.0r2)
    MM> do
    MM> attr(dat, "row.names") <-  <nonunique character>
  
    MM> and get a resulting data.frame  dat   with non-unique row.names.

    MM> PROPOSITION 2:  I think I want to make sure that a(the same?)
    MM>    error message as in "2)" is generated in this case.

    MM> (this is relatively easily accomplished via R's 
    MM> SetAttrib() in src/main/attrib.c)

    MM> ------------------------------------------------------------------
    >>>>> or am I completely wrong, and there should be a way you can
    >>>>> construct a data.frame with non-unique row.names ???
    MM> ------------------------------------------------------------------

It seems I am wrong -- at least in the eyes of Mathsoft's  S-plus 5
writers:

	data.frame() in S-plus 5.0r2's  has a new argument

		dup.names.ok = F

which you can set to TRUE in order to construct data.frames with non-unique
row.names.

Note however that John Chamber's  S version 4 (which underlies S-plus 5)
does NOT have such an argument.

---
Currently,
I tend to conclude that we should follow S-plus 5 here,
and allow  non-unique row.names , however only via the low-level
	attr(. "row.names")
and not via  row.names(.) <- ...

Further: the 'dup.names.ok = FALSE' seems a good idea for data.frame()
	when you want to speed up constructing of huge data.frames...
	(e.g. when using  read.table(.) on large files !!).

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._