[R-pkg-devel] Debian and Fedora clang segmentation faults

Ivan Krylov |kry|ov @end|ng |rom d|@root@org
Mon May 27 22:37:18 CEST 2024


В Mon, 27 May 2024 13:29:56 -0500
Stephen Meyers <srmeyers2 using wisc.edu> пишет:

> I'm updating the 'astrochron' R package, and I'm trying to resolve a
> new segmentation fault that arises only with the Debian and Fedora
> clang compilers. An example is the function 'asm', which has been a
> component of astrochron since its debut July 2014:
> 
> https://cran.r-project.org/web/checks/check_results_astrochron.html

This one is reproducible using containers or a virtual machine. Indeed,
the code crashes at the very beginning of the asm18_R subroutine:

> asm(freq=freq,target=target,fper=fper,rayleigh=rayleigh,nyquist=nyquist,sedmin=0.5,sedmax=3,
+     numsed=100,linLog=1,iter=100000,output=FALSE)

----- PERFORMING AVERAGE SPECTRAL MISFIT ANALYSIS -----

Program received signal SIGSEGV, Segmentation fault.
0x00007ff407f36774 in asm18_r_ ()
(gdb) disas
Dump of assembler code for function asm18_r_:
   0x00007ff407f36760 <+0>:     push   %rbp
   0x00007ff407f36761 <+1>:     mov    %rsp,%rbp
   0x00007ff407f36764 <+4>:     push   %r15
   0x00007ff407f36766 <+6>:     push   %r14
   0x00007ff407f36768 <+8>:     push   %r13
   0x00007ff407f3676a <+10>:    push   %r12
   0x00007ff407f3676c <+12>:    push   %rbx
   0x00007ff407f3676d <+13>:    sub    $0x17e42a78,%rsp
=> 0x00007ff407f36774 <+20>:    mov    %r9,-0x17e42838(%rbp)
   0x00007ff407f3677b <+27>:    mov    %r8,-0x17e42830(%rbp)
   0x00007ff407f36782 <+34>:    mov    %rcx,-0x17e42828(%rbp)
   0x00007ff407f36789 <+41>:    mov    %rdx,-0x17e42820(%rbp)
   0x00007ff407f36790 <+48>:    mov    %rsi,-0x17e42818(%rbp)
   0x00007ff407f36797 <+55>:    mov    %rdi,-0x17e42810(%rbp)

flang-new-18 decided to subtract 400 megabytes from the stack pointer
right from the start, and never mind the fact that operating systems
treat the stack space like hundred-year-old brandy and the total stack
size limit is only 8 megabytes or so.

I think that the 400.8 megabytes come from the saveAsm(mxsr,mxdata)
array, which is mxsr=500 * mxdata=100000 * 8 bytes per real(8) in size,
and store(mxdata), which takes additional 800 kilobytes. When compiling
with warnings enabled, GFortran even produces a message about it:

>> Warning: Array ‘saveasm’ at (1) is larger than limit set by
>> ‘-fmax-stack-var-size=’, moved from stack to static storage. This
>> makes the procedure unsafe when called recursively, or concurrently
>> from multiple threads. Consider increasing the
>> ‘-fmax-stack-var-size=’ limit (or use ‘-frecursive’, which implies
>> unlimited ‘-fmax-stack-var-size’) - or change the code to use an
>> ALLOCATABLE array. If the variable is never accessed concurrently,
>> this warning can be ignored, and the variable could also be declared
>> with the SAVE attribute. [-Wsurprising]

The Fortran standard is silent about the stack vs heap vs static
storage issue, so flang-new is technically allowed to try to fit 400
megabytes of temporary storage on the stack [*].

Since asm18_R doesn't seem to be supposed to be reentrant, the fix is
to give the SAVE attribute to the two large variables, making the
Fortran processors prefer a different memory location for them:

implicit real(8) (a-h,o-z)
save saveAsm, store

(Untested because I accidentally deleted the container while preparing
the message.)

-- 
Best regards,
Ivan

[*] https://stat.ethz.ch/pipermail/r-package-devel/2023q4/010237.html



More information about the R-package-devel mailing list