[R] jitter-bug? problematic behaviour of the jitter function

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Sep 23 21:32:32 CEST 2020


Hello,

R 4.0.2 on Ubuntu 20.04, sessionInfo at end.
This came up in r-help, I'm answering to the OP and also posting to 
r-devel since I believe it is more appropriate there.

I can confirm this. The original instructions are the first and the 
last, but even with smaller numbers the error shows up.


set.seed(2020)

jitter(c(1,2,10^4))  # desired behaviour
#[1]     1.058761     1.957690 10000.047401

jitter(c(0,1,10^4))  # bad behaviour
#[1]   -92.43546 -1454.61126  8269.53754

jitter(c(-1,0,10^4))  # bad behaviour
#[1] -1484.3895  -427.5283  8010.3308

jitter(c(1,2,10^5))  # bad behaviour
#[1]   4809.238  10578.561 109753.430


To the OP: I am cc-ing this to r-devel using r-project.org.
Questions like this are about R itself and should be posted there.


sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
  [1] LC_CTYPE=pt_PT.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=pt_PT.UTF-8        LC_COLLATE=pt_PT.UTF-8
  [5] LC_MONETARY=pt_PT.UTF-8    LC_MESSAGES=pt_PT.UTF-8
  [7] LC_PAPER=pt_PT.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.2


Hope this helps,

Rui Barradas

Às 11:32 de 23/09/20, Martin Keller-Ressel escreveu:
> Dear all,
> 
> i have noticed some strange behaviour in the „jitter“ function in R.
> On the help page for jitter it is stated that
> 
> "The result, say r, is r <- x + runif(n, -a, a) where n <- length(x) and a is the amount argument (if specified).“
> 
> and
> 
> "If amount is NULL (default), we set a <- factor * d/5 where d is the smallest difference between adjacent unique (apart from fuzz) x values.“
> 
> This works fine as long as there is no (very) large outlier
> 
>> jitter(c(1,2,10^4))  # desired behaviour
> [1]    1.083243    1.851571 9999.942716
> 
> But for very large outliers the added noise suddenly ‚jumps‘ to a much larger scale:
> 
>> jitter(c(1,2,10^5)) # bad behaviour
> [1] -19535.649   9578.702 115693.854
> # Noise should be of order (2-1)/5  = 0.2 but is of much larger order.
> 
> This probably does not matter much when jitter is used for plotting, but it can cause problems when jitter is used to break ties.
> 
> best regards,
> Martin
> 
> --------------------------------
> Martin Keller-Ressel
> Professor für Stochastische Analysis und Finanzmathematik
> Technische Universität Dresden
> Institut für Mathematische Stochastik
> Willersbau B 316, Zellescher Weg 12-14
> 01062 Dresden
> --------------------------------
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list