[R] jitter-bug? problematic behaviour of the jitter function
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Sep 23 21:32:32 CEST 2020
Hello,
R 4.0.2 on Ubuntu 20.04, sessionInfo at end.
This came up in r-help, I'm answering to the OP and also posting to
r-devel since I believe it is more appropriate there.
I can confirm this. The original instructions are the first and the
last, but even with smaller numbers the error shows up.
set.seed(2020)
jitter(c(1,2,10^4)) # desired behaviour
#[1] 1.058761 1.957690 10000.047401
jitter(c(0,1,10^4)) # bad behaviour
#[1] -92.43546 -1454.61126 8269.53754
jitter(c(-1,0,10^4)) # bad behaviour
#[1] -1484.3895 -427.5283 8010.3308
jitter(c(1,2,10^5)) # bad behaviour
#[1] 4809.238 10578.561 109753.430
To the OP: I am cc-ing this to r-devel using r-project.org.
Questions like this are about R itself and should be posted there.
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pt_PT.UTF-8 LC_COLLATE=pt_PT.UTF-8
[5] LC_MONETARY=pt_PT.UTF-8 LC_MESSAGES=pt_PT.UTF-8
[7] LC_PAPER=pt_PT.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.2
Hope this helps,
Rui Barradas
Às 11:32 de 23/09/20, Martin Keller-Ressel escreveu:
> Dear all,
>
> i have noticed some strange behaviour in the „jitter“ function in R.
> On the help page for jitter it is stated that
>
> "The result, say r, is r <- x + runif(n, -a, a) where n <- length(x) and a is the amount argument (if specified).“
>
> and
>
> "If amount is NULL (default), we set a <- factor * d/5 where d is the smallest difference between adjacent unique (apart from fuzz) x values.“
>
> This works fine as long as there is no (very) large outlier
>
>> jitter(c(1,2,10^4)) # desired behaviour
> [1] 1.083243 1.851571 9999.942716
>
> But for very large outliers the added noise suddenly ‚jumps‘ to a much larger scale:
>
>> jitter(c(1,2,10^5)) # bad behaviour
> [1] -19535.649 9578.702 115693.854
> # Noise should be of order (2-1)/5 = 0.2 but is of much larger order.
>
> This probably does not matter much when jitter is used for plotting, but it can cause problems when jitter is used to break ties.
>
> best regards,
> Martin
>
> --------------------------------
> Martin Keller-Ressel
> Professor für Stochastische Analysis und Finanzmathematik
> Technische Universität Dresden
> Institut für Mathematische Stochastik
> Willersbau B 316, Zellescher Weg 12-14
> 01062 Dresden
> --------------------------------
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list