[R] jitter-bug? problematic behaviour of the jitter function

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Sep 23 22:57:48 CEST 2020


Hello,

Thanks for the further explanation.
I believe that yes, it  would be a good idea to document a bit better 
that "apart from fuzz" is a rounding operation, it is said en passant, 
and its meaning is not clear.

Rui Barradas

Às 21:25 de 23/09/20, Duncan Murdoch escreveu:
> On 23/09/2020 4:03 p.m., Rui Barradas wrote:
>> Hello,
>>
>> I believe that though Duncan's explanation is right it is also not
>> explaining the value of the digits argument. round makes the first 2
>> numbers 0 but why?
> 
> If there had been rounding in their computation, you might see a 
> difference like 1e-15.  You wouldn't want to use that for the scale of 
> jittering, so some rounding is needed.
> 
> I think the documentation for the function is poor, but the intention 
> was probably to use the function in graphics (as the references did), 
> and in that case, any values too close together should be treated as 
> equal and jittering should separate them.  The particular computation 
> used says that if the range is in [1, 10), values equal to 3 decimal 
> places will be too close and need separation.
> 
> So I don't think this is a bug, but it might be a valid wishlist item: 
> document what "apart from fuzz" means, and perhaps allow it to be 
> controlled by the user.
> 
> Duncan Murdoch
> 
> 
> 
>   The function below prints the digits argument and
>> then outputs d. The code is taken from jitter.
>>
>>
>> f <- function(x){
>>     z <- diff(r <- range(x[is.finite(x)]))
>>     cat("digits:", 3 - floor(log10(z)), "\n")
>>     diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
>> }
>>
>>
>> Now see what cat outputs for 'digits'.
>>
>>
>> f(c(1,2,10^4))  # desired behaviour
>> #digits: 0
>> #[1]    1 9998
>> f(c(0,1,10^4))  # bad behaviour
>> #digits: -1
>> #[1] 10000
>> f(c(-1,0,10^4))  # bad behaviour
>> #digits: -1
>> #[1] 10000
>> f(c(1,2,10^5))  # bad behaviour
>> #digits: -1
>> #[1] 1e+05
>>
>>
>>
>> And according to the documentation of ?round, negative digits are 
>> allowed:
>>
>>
>> Rounding to a negative number of digits means rounding to a power of
>> ten, so for example round(x, digits = -2) rounds to the nearest hundred.
>>
>>
>> But in this case two of the numbers are closer to 0 than they are of 10.
>> And unique keeps only 0 and the largest, then diff is big.
>>
>>
>> round(c(1,2,10^4),0)  # desired behaviour
>> #[1]     1     2 10000
>> round(c(0,1,10^4),-1)  # bad behaviour
>> #[1]     0     0 10000
>> round(c(-1,0,10^4),-1)  # bad behaviour
>> #[1]     0     0 10000
>> round(c(1,2,10^5),-1)  # bad behaviour
>> #[1] 0e+00 0e+00 1e+05
>>
>>
>>
>> Isn't it still a bug?
>>
>> Rui Barradas
>>
>>
>> Às 15:57 de 23/09/20, Duncan Murdoch escreveu:
>>> On 23/09/2020 6:32 a.m., Martin Keller-Ressel wrote:
>>>> Dear all,
>>>>
>>>> i have noticed some strange behaviour in the „jitter“ function in R.
>>>> On the help page for jitter it is stated that
>>>>
>>>> "The result, say r, is r <- x + runif(n, -a, a) where n <- length(x)
>>>> and a is the amount argument (if specified).“
>>>>
>>>> and
>>>>
>>>> "If amount is NULL (default), we set a <- factor * d/5 where d is the
>>>> smallest difference between adjacent unique (apart from fuzz) x 
>>>> values.“
>>>>
>>>> This works fine as long as there is no (very) large outlier
>>>>
>>>>> jitter(c(1,2,10^4))  # desired behaviour
>>>> [1]    1.083243    1.851571 9999.942716
>>>>
>>>> But for very large outliers the added noise suddenly ‚jumps‘ to a much
>>>> larger scale:
>>>>
>>>>> jitter(c(1,2,10^5)) # bad behaviour
>>>> [1] -19535.649   9578.702 115693.854
>>>> # Noise should be of order (2-1)/5  = 0.2 but is of much larger order.
>>>>
>>>> This probably does not matter much when jitter is used for plotting,
>>>> but it can cause problems when jitter is used to break ties.
>>>
>>> I think this is kind of documented:  "apart from fuzz" is what counts.
>>> If you look at the code for jitter, you'll see this important line:
>>>
>>>    d <- diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
>>>
>>> By the time you get here, z is the length of the rante of the data, so
>>> it's 99999 in your example.  The rounding changes your values to
>>> 0,0,1e5, so the smallest difference is 1e5.
>>>
>>> Duncan Murdoch
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list