[Rd] Unexpected alteration of data frame column names

Tue May 15 20:25:39 CEST 2007

On Mon, 2007-05-14 at 23:59 -0700, Herve Pages wrote:
> Hi,
> 
> I'm using data.frame(..., check.names=FALSE), because I want to create
> a data frame with duplicated column names (in the real life you can get such
> data frame as the result of an SQL query):
> 
>   > df <- data.frame(aa=1:5, aa=9:5, check.names=FALSE)
>   > df
>     aa aa
>   1  1  9
>   2  2  8
>   3  3  7
>   4  4  6
>   5  5  5
> 
> Why is [.data.frame changing my column names?
> 
>   > df[1:3, ]
>     aa aa.1
>   1  1    9
>   2  2    8
>   3  3    7
> 
> How can this be avoided? Thanks!
> 
> H.

Herve,

I had not seen a reply to your post, but you can review the code for
"[.data.frame" by using:

  getAnywhere("[.data.frame")

and see where there are checks for duplicate column names in the
function.

That is going to be the default behavior for data frame
subsetting/extraction and in fact is noted in the 'ONEWS' file for R
version 1.8.0:

 - Subsetting a data frame can no longer produce duplicate
   column names.

So it has been around for some time (October of 2003).

In terms of avoiding it, I suspect that you would have to create your
own version of the function, perhaps with an additional argument that
enables/disables that duplicate column name checks.

I have not however considered the broader functional implications of
doing so however, so be vewwy vewwy careful here.

HTH,

Marc Schwartz