[R] strange behavior in data frames with duplicated column names
William Revelle
wr at revelle.net
Tue May 8 16:26:43 CEST 2007
Dear R gurus,
There is an interesting problem with accessing specific items in a
column of data frame that has incorrectly been given a duplicate
name, even though addressing the item by row and column number.
Although the column is correctly listed, an item addressed by row and
column number gives the item with the correct row and the original
not the duplicated column.
Here are the instructions to get this problem
x <- matrix(seq(1:12),ncol=3)
colnames(x) <- c("A","B","A") #a redundant name for column 2
x.df <- data.frame(x)
x.df #the redundant name is corrected
x.df[,3] #show the column -- this always works
x.df[2,3] #this works here
#now incorrectly label the columns with a duplicate name
colnames(x.df) <- c("A","B","A") #the redundant name is not detected
x.df
x.df[,3] #this works as above and shows the column
x.df[2,3] #but this gives the value of the first column, not the third <---
rownames(x.df) <- c("First","Second","Third","Third") #detects duplicate name
x.df
x.df[4,] #correct second row and corrected column names!
x.df[4,3] #wrong column
x.df #still has the original names with the duplication
and corresponding output:
> x <- matrix(seq(1:12),ncol=3)
> colnames(x) <- c("A","B","A") #a redundant name for column 2
> x.df <- data.frame(x)
> x.df #the redundant name is corrected
A B A.1
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
> x.df[,3] #show the column -- this always works
[1] 9 10 11 12
> x.df[2,3] #this works here
[1] 10
> #now incorrectly label the columns with a duplicate name
> colnames(x.df) <- c("A","B","A") #the redundant name is not detected
> x.df
A B A
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
> x.df[,3] #this works as above and shows the column
[1] 9 10 11 12
> x.df[2,3] #but this gives the value of the first column, not the
>third <---
[1] 2
> rownames(x.df) <- c("First","Second","Third","Third") #detects
>duplicate name
Error in `row.names<-.data.frame`(`*tmp*`, value = c("First", "Second", :
duplicate 'row.names' are not allowed
> x.df
A B A
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
> x.df[4,] #correct second row and corrected column names!
A B A.1
4 4 8 12
> x.df[4,3] #wrong column
[1] 4
> x.df #still has the original names with the duplication
> unlist(R.Version())
platform
arch os
"i386-apple-darwin8.9.1"
"i386" "darwin8.9.1"
system
status major
"i386, darwin8.9.1"
"Patched" "2"
minor
year month
"5.0"
"2007" "04"
day
svn rev language
"25"
"41315" "R"
version.string
"R version 2.5.0 Patched (2007-04-25 r41315)"
>
Bill
--
William Revelle http://personality-project.org/revelle.html
Professor http://personality-project.org/personality.html
Department of Psychology http://www.wcas.northwestern.edu/psych/
Northwestern University http://www.northwestern.edu/
Use R for statistics: http://personality-project.org/r
More information about the R-help
mailing list