[R] jitter-bug? problematic behaviour of the jitter function

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Sep 23 22:03:08 CEST 2020


Hello,

I believe that though Duncan's explanation is right it is also not 
explaining the value of the digits argument. round makes the first 2 
numbers 0 but why? The function below prints the digits argument and 
then outputs d. The code is taken from jitter.


f <- function(x){
   z <- diff(r <- range(x[is.finite(x)]))
   cat("digits:", 3 - floor(log10(z)), "\n")
   diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
}


Now see what cat outputs for 'digits'.


f(c(1,2,10^4))  # desired behaviour
#digits: 0
#[1]    1 9998
f(c(0,1,10^4))  # bad behaviour
#digits: -1
#[1] 10000
f(c(-1,0,10^4))  # bad behaviour
#digits: -1
#[1] 10000
f(c(1,2,10^5))  # bad behaviour
#digits: -1
#[1] 1e+05



And according to the documentation of ?round, negative digits are allowed:


Rounding to a negative number of digits means rounding to a power of 
ten, so for example round(x, digits = -2) rounds to the nearest hundred.


But in this case two of the numbers are closer to 0 than they are of 10. 
And unique keeps only 0 and the largest, then diff is big.


round(c(1,2,10^4),0)  # desired behaviour
#[1]     1     2 10000
round(c(0,1,10^4),-1)  # bad behaviour
#[1]     0     0 10000
round(c(-1,0,10^4),-1)  # bad behaviour
#[1]     0     0 10000
round(c(1,2,10^5),-1)  # bad behaviour
#[1] 0e+00 0e+00 1e+05



Isn't it still a bug?

Rui Barradas


Às 15:57 de 23/09/20, Duncan Murdoch escreveu:
> On 23/09/2020 6:32 a.m., Martin Keller-Ressel wrote:
>> Dear all,
>>
>> i have noticed some strange behaviour in the „jitter“ function in R.
>> On the help page for jitter it is stated that
>>
>> "The result, say r, is r <- x + runif(n, -a, a) where n <- length(x) 
>> and a is the amount argument (if specified).“
>>
>> and
>>
>> "If amount is NULL (default), we set a <- factor * d/5 where d is the 
>> smallest difference between adjacent unique (apart from fuzz) x values.“
>>
>> This works fine as long as there is no (very) large outlier
>>
>>> jitter(c(1,2,10^4))  # desired behaviour
>> [1]    1.083243    1.851571 9999.942716
>>
>> But for very large outliers the added noise suddenly ‚jumps‘ to a much 
>> larger scale:
>>
>>> jitter(c(1,2,10^5)) # bad behaviour
>> [1] -19535.649   9578.702 115693.854
>> # Noise should be of order (2-1)/5  = 0.2 but is of much larger order.
>>
>> This probably does not matter much when jitter is used for plotting, 
>> but it can cause problems when jitter is used to break ties.
> 
> I think this is kind of documented:  "apart from fuzz" is what counts. 
> If you look at the code for jitter, you'll see this important line:
> 
>   d <- diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
> 
> By the time you get here, z is the length of the rante of the data, so 
> it's 99999 in your example.  The rounding changes your values to 
> 0,0,1e5, so the smallest difference is 1e5.
> 
> Duncan Murdoch
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list