[Rd] unique.matrix issue [Was: Anomaly with unique and match]
jochen laubrock
jochen.laubrock at gmail.com
Mon Mar 28 16:54:47 CEST 2011
Still, from a user's perspective this behavior is somewhat irritating. Wouldn't it be better to rewrite unique.matrix to use formatC or sprintf instead of as.character, on which paste in line 9 implicitly relies, at least in R version 2.12.2 (2011-02-25)?
For example, use
temp <- apply(x, MARGIN, formatC, digits=324, format="f")
instead of
temp <- apply(x, MARGIN, function(x) paste(x, collapse = "\r"))
Don't know whether this affects performance, though.
Sorry to chime in late.
Cheers,
Jochen
> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
On Mar 9, 2011, at 20:11 , Simon Urbanek wrote:
> match() is a red herring here -- it is really a very specific thing that has to do with the fact that you're running unique() on a matrix. Also it's much easier to reproduce:
>
>> x=c(1,1+0.2e-15)
>> x
> [1] 1 1
>> sprintf("%a",x)
> [1] "0x1p+0" "0x1.0000000000001p+0"
>> unique(x)
> [1] 1 1
>> sprintf("%a",unique(x))
> [1] "0x1p+0" "0x1.0000000000001p+0"
>> unique(matrix(x,2))
> [,1]
> [1,] 1
>
> and this comes from the fact that unique.matrix uses string representation since it has to take into account all values of a row/column so it pastes all values into one string, but for the two numbers that is the same:
>> as.character(x)
> [1] "1" "1"
>
> Cheers,
> Simon
>
>
> On Mar 9, 2011, at 9:48 AM, Terry Therneau wrote:
>
>> I stumbled onto this working on an update to coxph. The last 6 lines
>> below are the question, the rest create a test data set.
>>
>> tmt585% R
>> R version 2.12.2 (2011-02-25)
>> Copyright (C) 2011 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> # Lines of code from survival/tests/singtest.R
>>> library(survival)
>> Loading required package: splines
>>> test1 <- data.frame(time= c(4, 3,1,1,2,2,3),
>> + status=c(1,NA,1,0,1,1,0),
>> + x= c(0, 2,1,1,1,0,0))
>>>
>>> temp <- rep(0:3, rep(7,4))
>>>
>>> stest <- data.frame(start = 10*temp,
>> + stop = 10*temp + test1$time,
>> + status = rep(test1$status,4),
>> + x = c(test1$x+ 1:7, rep(test1$x,3)),
>> + epoch = rep(1:4, rep(7,4)))
>>>
>>> fit1 <- coxph(Surv(start, stop, status) ~ x * factor(epoch), stest)
>>
>> ## New lines
>>> temp1 <- fit1$linear.predictor
>>> temp2 <- as.matrix(temp1)
>>> match(temp1, unique(temp1))
>> [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 8 8 8 6 6 6 9 9 9 6 6
>>> match(temp2, unique(temp2))
>> [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 NA NA NA 6 6 6 8 8 8
>> 6 6
>>
>> -----------------------
>>
>> I've solved it for my code by not calling match on a 1 column vector.
>> In general, however, should I be using some other paradym for this "map
>> to unique" operation? For example match(as.character(x),
>> unique(as.character(x)) ?
>>
>> Terry T
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list