[Rd] Difficult debug

Bill Dunlap w||||@mwdun|@p @end|ng |rom gm@||@com
Thu Feb 8 00:46:47 CET 2024


I haven't done any R memory debugging lately, but

https://www.mail-archive.com/rcpp-devel@lists.r-forge.r-project.org/msg10289.html
shows how I used to have gdb break where valgrind finds a problem so you
could examine the details.

Also, running your code after running gctorture(TRUE) can help track down
memory problems.

-Bill


On Wed, Feb 7, 2024 at 12:03 PM Therneau, Terry M., Ph.D. via R-devel <
r-devel using r-project.org> wrote:

>   I've hit a roadblock debugging a new update to the survival package.   I
> do debugging in
> a developement envinment, i.e. I don't create and load a package but
> rather  source all
> the .R files and dyn.load an .so file, which makes things a bit easier.
>
>    Running with R -d "valgrind --tool=memcheck --leak-check=full" one of
> my test files
> crashes in simple R code a dozen lines after the  likely culprit has been
> called, i.e, the
> survfit function for an Aalen-Johansen, containing a .Call to the new C
> code.     The
> valgrind approach had already allowed me to find a few other (mostly dumb)
> errors that led
> to an out of bounds access, e.g., the wrong endpoint variable in a for( )
> loop.    What
> would others advise as a next step?
>
> Here is the last part of the screen
>  > fit2 <- coxph(list(Surv(tstart, tstop, bstat) ~ 1,
> +                    c(1:4):5 ~ age / common + shared), id= id,
> istate=bili4,
> +               data=pbc2, ties='breslow', x=TRUE)
>  > surv2 <- survfit(fit2, newdata=list(age=50), p0=c(.4, .3, .2, .1, 0))
>  > test2 <- mysurv(fit2, pbc2$bili4, p0= 4:0/10, fit2, x0 =50)
> ==31730== Invalid read of size 8
> ==31730==    at 0x298A07: Rf_allocVector3 (memory.c:2861)
> ==31730==    by 0x299B2C: Rf_allocVector (Rinlinedfuns.h:595)
> ==31730==    by 0x299B2C: R_alloc (memory.c:2330)
> ==31730==    by 0x3243C6: do_which (summary.c:1152)
> ==31730==    by 0x23D8EF: bcEval (eval.c:7724)
> ==31730==    by 0x25731F: Rf_eval (eval.c:1152)
> ==31730==    by 0x25927D: R_execClosure (eval.c:2362)
> ==31730==    by 0x25A35A: R_execMethod (eval.c:2535)
> ==31730==    by 0x887E93F: R_dispatchGeneric (methods_list_dispatch.c:1151)
> ==31730==    by 0x2A0E72: do_standardGeneric (objects.c:1344)
> ==31730==    by 0x2577E7: Rf_eval (eval.c:1254)
> ==31730==    by 0x25927D: R_execClosure (eval.c:2362)
> ==31730==    by 0x25A01C: applyClosure_core (eval.c:2250)
> ==31730==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
> ==31730==
>
>   *** caught segfault ***
> address 0x10, cause 'memory not mapped'
>
> Traceback:
>   1: which(smap == j)
>   2: which(smap == j)
>   3: mysurv(fit2, pbc2$bili4, p0 = 4:0/10, fit2, x0 = 50)
>
> The offending call is amost certainly the one to survfit; mysurv() is a
> local function
> that caculates some things 'by hand'.   It does nothing complex: counts,
> loops, etc, the
> only non-base action is a call to Matrix::exp near the end, but the
> which() failure is
> well before that.
>
> The session info just before the offending material:
>
>  > sessionInfo()
> R Under development (unstable) (2024-02-07 r85873)
> Platform: x86_64-pc-linux-gnu
> Running under: Ubuntu 22.04.3 LTS
>
> Matrix products: default
> BLAS:   /usr/local/src/R-devel/lib/libRblas.so
> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> time zone: America/Chicago
> tzcode source: system (glibc)
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets methods
> [8] base
>
> other attached packages:
> [1] Matrix_1.6-0
>
> loaded via a namespace (and not attached):
> [1] compiler_4.4.0 tools_4.4.0    grid_4.4.0     lattice_0.22-5
>
>
> ---
> Footnote.  The impetus for this is realizing that the robust variance for
> an
> Aalen-Johansen was incorrect when there are case weights for a subject
> that vary over
> time;  a rare case but will occur with time dependent IPC weights.
> Carefully figuring
> this out has been all I did for the last week, leading to a new routine
> survfitaj.c and
> approx 14 pages of derivation and explanation in the methods.Rnw
> vignette.   Subjects who
> "change horses in midstream", i.e., swap from one curve to another
> mid-followup make the
> code more complex.   This arises out of the "extended Kaplan-Meier"; I am
> not a fan of
> this statistically, but some will use it and expect my code to work.
>
> --
> Terry M Therneau, PhD
> Department of Quantitative Health Sciences
> Mayo Clinic
> therneau using mayo.edu
>
> "TERR-ree THUR-noh"
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list