[Rd] RE: [R] Removing "row.names"

Kurt Hornik Kurt.Hornik@ci.tuwien.ac.at
Thu, 8 Feb 2001 14:41:03 +0100


>>>>> David James writes:

>> Date: Wed, 7 Feb 2001 09:33:12 -0800 (PST)
>> From: Thomas Lumley <tlumley@u.washington.edu>
>> To: Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at>
>> cc: Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk>, R-devel@r-project.org
>> Subject: Re: [Rd] RE: [R] Removing "row.names"
>> MIME-Version: 1.0
>> 
>> On Wed, 7 Feb 2001, Kurt Hornik wrote:
>> 
>> > >>>>> Thomas Lumley writes:
>> > 
>> > > On Wed, 7 Feb 2001, Kurt Hornik wrote:
>> > >> >>>>> Peter Dalgaard BSA writes:
>> > >> 
>> > >> > Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> writes:
>> > >> >> names(sampled) <- " "
>> > >> >> and
>> > >> >> dimnames(sampled)[[2]] <- " "
>> > >> >> 
>> > >> >> happily introduce non-unique variable names in the data frame.
>> > >> >> 
>> > >> >> Is the rule that row.names and names must be unique still on?
>> > >> >> 
>> > >> >> Argh ...
>> > >> 
>> > >> > Splus 3.4 dispatches on dimnames<-, but not on names<- with the
>> > >> > following curious result:
>> > >> 
>> > >> >> d <- data.frame(a=1:3,b=4:6)
>> > >> >> names(d)<-c(" "," ")
>> > >> >> d
>> > >> 
>> > >> > 1 1 4
>> > >> > 2 2 5
>> > >> > 3 3 6
>> > >> >> dimnames(d)[[1]] <- rep(" ",3)  
>> > >> > Error in "dimnames<-.data.frame"(d, .A0): column names must be unique
>> > >> > Dumped
>> > >> 
>> > >> > R dispatches similarly, but doesn't check the dimnames in
>> > >> > dimnames<-.data.frame. It could do so quite easily. Just add 
>> > >> 
>> > >> > || any(duplicated(d[[1]])) || any(duplicated(d[[2]]))
>> > >> 
>> > >> > at the appropriate spot.
>> > >> 
>> > >> Thomas' view about what should be permitted seems to be different.
>> > 
>> > > I wouldn't object to making it hard to create duplicated names(), but
>> > > I think it would be a bad idea to have data.frame() make up unique
>> > > names if it's given non-unique ones.
>> > 
>> > Maybe `check.names' could also be used for uniqueness testing?
>> > 
>> > In any case, I think we should specify what *exactly* a data frame is.
>> >
>> 
>> I think we should specify, and check.names is a logical way to
>> allow/forbid non-unique columns.  
>> 
>> Having a new class would be messy: logically it shouldn't inherit from
>> data.frame, data.frame should inherit from it, but that would be a real
>> pain to set up.
>> 

> Data frames were originally meant to be used in modeling functions.
> The opening paragraph in Chapter 3 (Data for Models) in the White Book
> says:
 
>   "This chapter describes the general structure for data that
>   will be used throughout the book.  In particular, it introduces the
>   data frame, a class of objects to represent the data typically encounterd  
>   in fitting models."

> However, data.frames may not be quite appropriate for representing
> other types of tabular data (certainly a data.frame does not capture
> the essence of, say, a "relational" table in the SQL sense, which
> doesn't have the concept of row names).  Several manifestations of
> this problem are coercing character data to factors "at the drop of a
> hat" (as someone wrote here or in s-news), the row.names issue now
> being discussed, problems including general objets in the "cells" of
> the data.frame, etc.

> I think that the concept of a data.frame to represent data for fitting
> models is fine, but we may (certainly I) have abused this concept.  We
> need other classes of tabular data objects in addition (not as a
> replacement) to data.frames, together with coercion methods and
> perhaps other utilities.

Thomas had said that yes it would be nice to have something with less
restrictions for modeling, but that it was uneconomical at least to
introduce a new class that data.frame would then inherit from.

I interpret your comment as suggesting that we introduce a new class for
holding tabular data?  Do you have specific ideas on this?

-k
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._