[R] jitter-bug? problematic behaviour of the jitter function
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Wed Sep 23 22:25:36 CEST 2020
On 23/09/2020 4:03 p.m., Rui Barradas wrote:
> Hello,
>
> I believe that though Duncan's explanation is right it is also not
> explaining the value of the digits argument. round makes the first 2
> numbers 0 but why?
If there had been rounding in their computation, you might see a
difference like 1e-15. You wouldn't want to use that for the scale of
jittering, so some rounding is needed.
I think the documentation for the function is poor, but the intention
was probably to use the function in graphics (as the references did),
and in that case, any values too close together should be treated as
equal and jittering should separate them. The particular computation
used says that if the range is in [1, 10), values equal to 3 decimal
places will be too close and need separation.
So I don't think this is a bug, but it might be a valid wishlist item:
document what "apart from fuzz" means, and perhaps allow it to be
controlled by the user.
Duncan Murdoch
The function below prints the digits argument and
> then outputs d. The code is taken from jitter.
>
>
> f <- function(x){
> z <- diff(r <- range(x[is.finite(x)]))
> cat("digits:", 3 - floor(log10(z)), "\n")
> diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
> }
>
>
> Now see what cat outputs for 'digits'.
>
>
> f(c(1,2,10^4)) # desired behaviour
> #digits: 0
> #[1] 1 9998
> f(c(0,1,10^4)) # bad behaviour
> #digits: -1
> #[1] 10000
> f(c(-1,0,10^4)) # bad behaviour
> #digits: -1
> #[1] 10000
> f(c(1,2,10^5)) # bad behaviour
> #digits: -1
> #[1] 1e+05
>
>
>
> And according to the documentation of ?round, negative digits are allowed:
>
>
> Rounding to a negative number of digits means rounding to a power of
> ten, so for example round(x, digits = -2) rounds to the nearest hundred.
>
>
> But in this case two of the numbers are closer to 0 than they are of 10.
> And unique keeps only 0 and the largest, then diff is big.
>
>
> round(c(1,2,10^4),0) # desired behaviour
> #[1] 1 2 10000
> round(c(0,1,10^4),-1) # bad behaviour
> #[1] 0 0 10000
> round(c(-1,0,10^4),-1) # bad behaviour
> #[1] 0 0 10000
> round(c(1,2,10^5),-1) # bad behaviour
> #[1] 0e+00 0e+00 1e+05
>
>
>
> Isn't it still a bug?
>
> Rui Barradas
>
>
> Às 15:57 de 23/09/20, Duncan Murdoch escreveu:
>> On 23/09/2020 6:32 a.m., Martin Keller-Ressel wrote:
>>> Dear all,
>>>
>>> i have noticed some strange behaviour in the „jitter“ function in R.
>>> On the help page for jitter it is stated that
>>>
>>> "The result, say r, is r <- x + runif(n, -a, a) where n <- length(x)
>>> and a is the amount argument (if specified).“
>>>
>>> and
>>>
>>> "If amount is NULL (default), we set a <- factor * d/5 where d is the
>>> smallest difference between adjacent unique (apart from fuzz) x values.“
>>>
>>> This works fine as long as there is no (very) large outlier
>>>
>>>> jitter(c(1,2,10^4)) # desired behaviour
>>> [1] 1.083243 1.851571 9999.942716
>>>
>>> But for very large outliers the added noise suddenly ‚jumps‘ to a much
>>> larger scale:
>>>
>>>> jitter(c(1,2,10^5)) # bad behaviour
>>> [1] -19535.649 9578.702 115693.854
>>> # Noise should be of order (2-1)/5 = 0.2 but is of much larger order.
>>>
>>> This probably does not matter much when jitter is used for plotting,
>>> but it can cause problems when jitter is used to break ties.
>>
>> I think this is kind of documented: "apart from fuzz" is what counts.
>> If you look at the code for jitter, you'll see this important line:
>>
>> d <- diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
>>
>> By the time you get here, z is the length of the rante of the data, so
>> it's 99999 in your example. The rounding changes your values to
>> 0,0,1e5, so the smallest difference is 1e5.
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list