[R] jitter-bug? problematic behaviour of the jitter function

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed Sep 23 16:57:02 CEST 2020


On 23/09/2020 6:32 a.m., Martin Keller-Ressel wrote:
> Dear all,
> 
> i have noticed some strange behaviour in the „jitter“ function in R.
> On the help page for jitter it is stated that
> 
> "The result, say r, is r <- x + runif(n, -a, a) where n <- length(x) and a is the amount argument (if specified).“
> 
> and
> 
> "If amount is NULL (default), we set a <- factor * d/5 where d is the smallest difference between adjacent unique (apart from fuzz) x values.“
> 
> This works fine as long as there is no (very) large outlier
> 
>> jitter(c(1,2,10^4))  # desired behaviour
> [1]    1.083243    1.851571 9999.942716
> 
> But for very large outliers the added noise suddenly ‚jumps‘ to a much larger scale:
> 
>> jitter(c(1,2,10^5)) # bad behaviour
> [1] -19535.649   9578.702 115693.854
> # Noise should be of order (2-1)/5  = 0.2 but is of much larger order.
> 
> This probably does not matter much when jitter is used for plotting, but it can cause problems when jitter is used to break ties.

I think this is kind of documented:  "apart from fuzz" is what counts. 
If you look at the code for jitter, you'll see this important line:

  d <- diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))

By the time you get here, z is the length of the rante of the data, so 
it's 99999 in your example.  The rounding changes your values to 
0,0,1e5, so the smallest difference is 1e5.

Duncan Murdoch



More information about the R-help mailing list