[Rd] Bugs in unique of data.frame and matrix
Stavros Macrakis
macrakis at alum.mit.edu
Sat Jun 27 00:59:53 CEST 2009
R version 2.8.1 (2008-12-22) / Windows XP
There are several bugs in unique for data frames and matrices. Please
find minimal reproducible examples below.
-s
-----A-----
Unique of a vector uses numerical comparison:
> nn <- ((1+2^-52)^(5:22))
> unique(nn)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
While unique of a data frame uses comparison of the 15-digit string:
> unique(data.frame(a=nn))
a
1 1
Similarly:
> unique(matrix(nn,ncol=1))
[,1]
[1,] 1
-----B-----
> df <- data.frame(a=c("\r",""),b=c("","\r"))
> unique(df)
a b
1 \r
> unique(as.matrix(df))
a b
[1,] "\r" ""
Though "\r" is no doubt rare in strings, it is perfectly legal.
-----C-----
For vectors and data frames, unique preserves the POSIXct class:
dd <- as.POSIXct('1999-1-1')
> unique(dd)
[1] "1999-01-01 EST"
> unique(data.frame(a=dd))
a
1 1999-01-01
But for matrices, it converts to the underlying number:
> unique(matrix(dd))
[,1]
[1,] 915166800
-----workaround-----
The first two bugs can be worked around by converting the matrix to a
list of vectors, calling unique, then converting back:
library(plyr)
laply(unique(alply(matrix(nn,ncol=1),1,identity)),identity,.drop=FALSE)
laply(unique(alply(mm,1,identity)),identity,.drop=FALSE)
More information about the R-devel
mailing list