[Rd] Bug report with patch: `stats:::regularize.values()` always creates full copies of `x` and `y`
Evgeni Chasnovski
evgeni@ch@@nov@ki @ending from gm@il@com
Sun Dec 30 12:49:45 CET 2018
This is intended to be a bug report with proposed patch. I am posting to
this mailing list as described in NOTE in "Bug Reporting in R".
Function `stats:::regularize.values()` is meant to preprocess `x` and `y`
arguments to have "proper" values for later use during interpolation. If
input is already "proper", I would expect it to reuse the same objects
without creating new ones. However, this isn't the case and is the source
of unneccessary extra memory usage in `approx()` and others.
The root cause of this seems to be a forceful reordering in lines 37-39 of
'approx.R' file. If reordering is done only if `x` is unsorted then no
copies are created. Also this doesn't seem like breaking any existing code.
There is a patch attached.
Reproducable code:
x <- seq(1, 100, 1)
y <- seq(1, 100, 1)
reg_xy <- stats:::regularize.values(x, y, mean)
# Regularized versions of `x` and `y` are identical to input but are stored
at
# different places
identical(x, reg_xy[["x"]])
#> [1] TRUE
.Internal(inspect(x))
#> @15719b0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(reg_xy[["x"]]))
#> @2b84130 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
identical(y, reg_xy[["y"]])
#> [1] TRUE
.Internal(inspect(y))
#> @2c91be0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(reg_xy[["y"]]))
#> @2bb4880 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
# Differs from original only by using `if (is.unsorted(x))`
new_regularize.values <- function (x, y, ties) {
x <- xy.coords(x, y, setLab = FALSE)
y <- x$y
x <- x$x
if (any(na <- is.na(x) | is.na(y))) {
ok <- !na
x <- x[ok]
y <- y[ok]
}
nx <- length(x)
if (!identical(ties, "ordered")) {
if (is.unsorted(x)) {
o <- order(x)
x <- x[o]
y <- y[o]
}
if (length(ux <- unique(x)) < nx) {
if (missing(ties))
warning("collapsing to unique 'x' values")
y <- as.vector(tapply(y, match(x, x), ties))
x <- ux
stopifnot(length(y) == length(x))
}
}
list(x = x, y = y)
}
new_reg_xy <- new_regularize.values(x, y, mean)
# Output is still identical to input and also references to the same objects
identical(x, new_reg_xy[["x"]])
#> [1] TRUE
.Internal(inspect(x))
#> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(new_reg_xy[["x"]]))
#> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
identical(y, new_reg_xy[["y"]])
#> [1] TRUE
.Internal(inspect(y))
#> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(new_reg_xy[["y"]]))
#> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
# Current R version
R.version
#> _
#> platform x86_64-pc-linux-gnu
#> arch x86_64
#> os linux-gnu
#> system x86_64, linux-gnu
#> status
#> major 3
#> minor 5.2
#> year 2018
#> month 12
#> day 20
#> svn rev 75870
#> language R
#> version.string R version 3.5.2 (2018-12-20)
#> nickname Eggshell Igloo
--
Best regards,
Evgeni Chasnovski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.diff
Type: text/x-patch
Size: 522 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20181230/63e44ead/attachment.bin>
More information about the R-devel
mailing list