[R] strange behavior in data frames with duplicated column names
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue May 8 20:10:58 CEST 2007
First, you should not be using colnames<-, which is for a matrix, on a
data frame. Use names<- for data frames (and as.data.frame to convert to
a data frame).
Second, whereas duplicate row names are not allowed in a data frame,
duplicate column names are but at your own risk.
Third, there is a 'optimization too far' here which I will change in 2.5.0
patched. Often with R development there is a tradeoff between speed and
generality.
On Tue, 8 May 2007, William Revelle wrote:
> Dear R gurus,
>
> There is an interesting problem with accessing specific items in a
> column of data frame that has incorrectly been given a duplicate
> name, even though addressing the item by row and column number.
> Although the column is correctly listed, an item addressed by row and
> column number gives the item with the correct row and the original
> not the duplicated column.
>
> Here are the instructions to get this problem
>
> x <- matrix(seq(1:12),ncol=3)
> colnames(x) <- c("A","B","A") #a redundant name for column 2
> x.df <- data.frame(x)
> x.df #the redundant name is corrected
> x.df[,3] #show the column -- this always works
> x.df[2,3] #this works here
> #now incorrectly label the columns with a duplicate name
> colnames(x.df) <- c("A","B","A") #the redundant name is not detected
> x.df
> x.df[,3] #this works as above and shows the column
> x.df[2,3] #but this gives the value of the first column, not the third <---
> rownames(x.df) <- c("First","Second","Third","Third") #detects duplicate name
> x.df
> x.df[4,] #correct second row and corrected column names!
> x.df[4,3] #wrong column
> x.df #still has the original names with the duplication
>
>
> and corresponding output:
>
>> x <- matrix(seq(1:12),ncol=3)
>> colnames(x) <- c("A","B","A") #a redundant name for column 2
>> x.df <- data.frame(x)
>> x.df #the redundant name is corrected
> A B A.1
> 1 1 5 9
> 2 2 6 10
> 3 3 7 11
> 4 4 8 12
>> x.df[,3] #show the column -- this always works
> [1] 9 10 11 12
>> x.df[2,3] #this works here
> [1] 10
>> #now incorrectly label the columns with a duplicate name
>> colnames(x.df) <- c("A","B","A") #the redundant name is not detected
>> x.df
> A B A
> 1 1 5 9
> 2 2 6 10
> 3 3 7 11
> 4 4 8 12
>> x.df[,3] #this works as above and shows the column
> [1] 9 10 11 12
>> x.df[2,3] #but this gives the value of the first column, not the
>> third <---
> [1] 2
>> rownames(x.df) <- c("First","Second","Third","Third") #detects
>> duplicate name
> Error in `row.names<-.data.frame`(`*tmp*`, value = c("First", "Second", :
> duplicate 'row.names' are not allowed
>> x.df
> A B A
> 1 1 5 9
> 2 2 6 10
> 3 3 7 11
> 4 4 8 12
>> x.df[4,] #correct second row and corrected column names!
> A B A.1
> 4 4 8 12
>> x.df[4,3] #wrong column
> [1] 4
>> x.df #still has the original names with the duplication
>
>> unlist(R.Version())
> platform
> arch os
> "i386-apple-darwin8.9.1"
> "i386" "darwin8.9.1"
> system
> status major
> "i386, darwin8.9.1"
> "Patched" "2"
> minor
> year month
> "5.0"
> "2007" "04"
> day
> svn rev language
> "25"
> "41315" "R"
> version.string
> "R version 2.5.0 Patched (2007-04-25 r41315)"
>>
>
>
> Bill
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list