[R-pkg-devel] Additional issue clang-ASAN, gcc-ASAN
Ivan Krylov
|kry|ov @end|ng |rom d|@root@org
Tue Feb 4 12:30:03 CET 2025
В Sun, 2 Feb 2025 22:56:47 +0000
Bernd.Gruber <Bernd.Gruber using canberra.edu.au> пишет:
> READ of size 16 at 0x518000697ff0 thread T0
> #0 0x7f2e873ccfdf in bytesToDouble
> /tmp/RtmpNNPUz9/R.INSTALL3cef1f2b1bd39c/adegenet/src/snpbin.c:225:19
> #1 0x7f2e873ceca5 in snpbin2freq
> /tmp/RtmpNNPUz9/R.INSTALL3cef1f2b1bd39c/adegenet/src/snpbin.c:332:5
> #2 0x7f2e873ceca5 in snpbin_dotprod_freq
> /tmp/RtmpNNPUz9/R.INSTALL3cef1f2b1bd39c/adegenet/src/snpbin.c:447:5
> #3 0x7f2e873bba42 in GLdotProd
> /tmp/RtmpNNPUz9/R.INSTALL3cef1f2b1bd39c/adegenet/src/GLfunctions.c:42:14
Ben Bolker is exactly right; the problem happens in the 'adegenet'
code. Why?
bytesToDouble() is asked to unpack the bytes from the 'vecbytes' array
(26 bytes) into individual bits stored as doubles in the 'out' array.
The latter was allocated by the snpbin_dotprod_freq() function to
contain 199 elements [1]. Every byte must be unpacked into 8 bits, and
199 is less than 26*8 = 208. Where did the values come from?
The C function GLsumFreq() stores them unchanged from its arguments
[2], and those come from the SNPbin objects passed by R code [3] from
nLoc(x) and length(x$gen[[1]]@snp[[1]]). Where do they originate?
The R traceback at the point of the crash is dartR.base::gl.pcoa ->
adegenet::glPca -> adegenet::glDotProd. The object 'possums.gl' of S4
class 'dartR' exported by 'dartR.base' appears valid: its .$n.loc is
exactly equal to length(.$gen[[1]]@snp[[1]]) * 8, so the allocation size
matches the packed binary content.
The subset possums.gl[1:50,] that is used to perform PCA, on the other
hand, is invalid: length(possums.gl[1:50,]$gen[[1]]@snp[[1]]) is 26
instead of 25, which later causes bytesToDouble() to try to write extra
8 doubles (64 bytes) into the buffer.
This happens because trying to extract all SNPs from an SNPbin object
introduces an extra byte:
possums.gl using gen[[1]] |> _ using snp |> lengths()
# [1] 25 25
possums.gl using gen[[1]][rep(TRUE, nLoc(possums.gl using gen[[1]]))] |>
_ using snp |> lengths()
# 26 26
This can be traced to a bug in adegenet:::.subsetbin:
.subsetbin(as.raw(0xff), 1:8)
# [1] ff 00 # <-- should be just 'ff'
xint <- as.integer(rawToBits(x)[i]) # may be not divisible by 8
# so introduce padding: the following line gives 8 bits of padding
# instead of 0 when length(xint) is divisible by 8
zeroes <- 8 - (length(xint)%%8)
# instead use something like:
# zeroes <- (8 - (length(xint)%%8)) * (length(xint)%%8 > 0)
# (could probably be golfed further)
return(packBits(c(xint, rep(0L, zeroes))))
But we're getting two bugs for the price of one, because even with a
25-byte buffer, nLoc(.) == 199 would still result in an 8-byte
overflow. This is solely on the bytesToDouble() C function: it ought to
know to stop after writing *reslength elements into the 'vecres' array.
I'm afraid there is no easy way to work around either of the bugs in
the dartR.base code.
--
Best regards,
Ivan
[1]
https://github.com/thibautjombart/adegenet/blob/c7287597155ab18989d892a72eff33cf8c288958/src/snpbin.c#L443-L444
[2]
https://github.com/thibautjombart/adegenet/blob/c7287597155ab18989d892a72eff33cf8c288958/src/GLfunctions.c#L124
[3]
https://github.com/thibautjombart/adegenet/blob/c7287597155ab18989d892a72eff33cf8c288958/R/glFunctions.R#L215-L216
More information about the R-package-devel
mailing list