[Rd] unique.matrix issue [Was: Anomaly with unique and match]
Henrik Bengtsson
hb at biostat.ucsf.edu
Thu Mar 10 10:19:48 CET 2011
It should be possible to run unique()/duplicated() column by column
and incrementally update the set of unique/duplicated rows. This
would avoid any coercing. The benefit should be even greater for
data.frame():s.
My $.02
/Henrik
On Thu, Mar 10, 2011 at 12:29 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
> On Wed, Mar 09, 2011 at 02:11:49PM -0500, Simon Urbanek wrote:
>> match() is a red herring here -- it is really a very specific thing that has to do with the fact that you're running unique() on a matrix. Also it's much easier to reproduce:
>>
>> > x=c(1,1+0.2e-15)
>> > x
>> [1] 1 1
>> > sprintf("%a",x)
>> [1] "0x1p+0" "0x1.0000000000001p+0"
>> > unique(x)
>> [1] 1 1
>> > sprintf("%a",unique(x))
>> [1] "0x1p+0" "0x1.0000000000001p+0"
>> > unique(matrix(x,2))
>> [,1]
>> [1,] 1
>>
>> and this comes from the fact that unique.matrix uses string representation since it has to take into account all values of a row/column so it pastes all values into one string, but for the two numbers that is the same:
>> > as.character(x)
>> [1] "1" "1"
>
> I understand the use of match() in the original message by Terry Therneau
> as an example of a situation, where the behavior of unique.matrix() becomes
> visible even without looking at the last bits of the numbers.
>
> Let me suggest to consider the following example.
>
> x <- 1 + c(1.1, 1.3, 1.7, 1.9)*1e-14
> a <- cbind(rep(x, each=2), 2)
> rownames(a) <- 1:nrow(a)
>
> The correct set of rows may be obtained using
>
> unique(a - 1)
>
> [,1] [,2]
> 1 1.110223e-14 1
> 3 1.310063e-14 1
> 5 1.709743e-14 1
> 7 1.909584e-14 1
>
> However, due to the use of paste(), which uses as.character(), in
> unique.matrix(), we also have
>
> unique(a)
>
> [,1] [,2]
> 1 1 2
> 5 1 2
>
> Let me suggest to consider a transformation of the numeric columns
> by rank() before the use of paste(). For example
>
> unique.mat <- function(a)
> {
> temp <- apply(a, 2, rank, ties.method="max")
> temp <- apply(temp, 1, function(x) paste(x, collapse = "\r"))
> a[!duplicated(temp), , drop=FALSE]
> }
>
> unique.mat(a)
>
> [,1] [,2]
> 1 1 2
> 3 1 2
> 5 1 2
> 7 1 2
>
> Petr Savicky.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list