[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

maechler at stat.math.ethz.ch maechler at stat.math.ethz.ch
Thu Apr 23 12:40:22 CEST 2009


>>>>> "vQ" == Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
>>>>>     on Thu, 23 Apr 2009 11:49:54 +0200 writes:

    vQ> maechler at stat.math.ethz.ch wrote:
    >> 
    vQ> sprintf has a documented limit on strings included in the output using the
    vQ> format '%s'.  It appears that there is a limit on the length of strings included
    vQ> with, e.g., the format '%d' beyond which surprising things happen (output
    vQ> modified for conciseness):
    >> 

    vQ> ... and this limit is *not* documented.

well, it is basically (+ a few bytes ?)
the same  8192  limit that *is* documented.

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1] 9000 9801
    >> 
    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602
    >> 
    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602 11403
    >> 
    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1]  9000  9801 10602 11403 12204
    >> 
    vQ> ...
    >> 
    vQ> Note that not only more than one '1' is included in the output, but also that
    vQ> the same functional expression (no side effects used beyond the interface) gives
    vQ> different results on each execution.  Analogous behaviour can be observed with
    vQ> '%nd' where n > 8200.
    >> 
    vQ> The actual output above is consistent across separate sessions.
    >> 
    vQ> With sufficiently large field width values, R segfaults:
    >> 
    vQ> sprintf('%*d', 10^5, 1)
    vQ> # *** caught segfault ***
    vQ> # address 0xbfcfc000, cause 'memory not mapped'
    vQ> # Segmentation fault
    >> 
    >> 
    >> Thank you, Wacek.
    >> That's all ``interesting''  ... unfortunately, 
    >> 
    >> my version of  'man 3 sprintf' contains
    >> 
    >> 
    >>>> BUGS
    >>>> Because sprintf() and vsprintf() assume an arbitrarily
    >>>> long string, callers must be careful not to overflow the
    >>>> actual space; this is often impossible to assure. Note
    >>>> that the length of the strings produced is
    >>>> locale-dependent and difficult to predict.  Use
    >>>> snprintf() and vsnprintf() instead (or asprintf() and vasprintf).

    vQ> yes, but this is c documentation, not r documentation.

Of course! ...  and I  *do*  apply it to R's C code [sprintf.c]
and hence am even concurring with you .. 

    vQ> it's applicable
    vQ> to a degree, since ?sprintf does say that sprintf is "a wrapper for the
    vQ> C function 'sprintf'".  however, in c you use a buffer and you usually
    vQ> have control over it's capacity, while in r this is a hidden
    vQ> implementational detail, which should not be visible to the user, or
    vQ> should cause an attempt to overflow the buffer to fail more gracefully
    vQ> than with a segfault.

    vQ> in r, sprintf('%9000d', 1) will produce a confused output with a count
    vQ> of 1's variable (!) across runs (while sprintf('%*d', 9000, 1) seems to
    vQ> do fine):

    vQ> gregexpr('1', sprintf('%*d', 9000, 1))
    vQ> # [1] 9000

    vQ> gregexpr('1', sprintf('%9000d', 1))
    vQ> # [1] 9000 9801 ..., variable across executions

    vQ> on one execution in a series i actually got this:

    vQ> Warning message:
    vQ> In gregexpr("1", sprintf("%9000d", 1)) :
    vQ> input string 1 is invalid in this locale

    vQ> while the very next execution, still in the same session, gave

    vQ> # [1]  9000  9801 10602

    vQ> with sprintf('%*d', 10000, 1) i got segfaults on some executions but
    vQ> correct output on others, while sprintf('%10000d', 1) is confused again.



    >> (note the "impossible" part above)       
    >> 

    vQ> yes, but it does also say "must be careful", and it seems that someone
    vQ> has not been careful enough.

    >> and we haven't used  snprintf() yet, probably because it
    >> requires the  C99 C standard, and AFAIK, we have only relatively
    >> recently started to more or less rely on C99 in the R sources.
    >> 

    vQ> while snprintf would help avoid buffer overflow, it may not be a
    vQ> solution to the issue of confused output.

I think it would / will.  We would be able to give warnings and
errors, by checking the  snprintf()  return codes.


    >> More precisely, I see that some windows-only code relies on
    >> snprintf() being available  whereas in at least on non-Windows
    >> section, I read   /* we cannot assume snprintf here */
    >> 
    >> Now such platform dependency issues and corresponding configure
    >> settings I do typically leave to other R-corers with a much
    >> wider overview about platforms and their compilers and C libraries.
    >> 

    vQ> it looks like src/main/sprintf.c is just buggy, and it's plausible that
    vQ> the bug could be repaired in a platform-independent manner.

definitely.
In the mean time, I've actually found that what I first said on
the usability of snprintf() in R's code base was only partly correct.
There are other parts of R code where we use  snprintf() for all
platforms, hence we rely on its presence (and correct
implementation!) and so we can and I think should use it in
place of sprintf() in quite a few places inside R's sprintf.c


    >> BTW,  
    >> 1) sprintf("%n %g", 1,1)   also seg.faults
    >> 

    vQ> as do

    vQ> sprintf('%n%g', 1, 1)
    vQ> sprintf('%n%')

    vQ> etc., while

    vQ> sprintf('%q%g', 1, 1)
    vQ> sprintf('%q%')
  
    vQ> work just fine.  strange, because per ?sprintf 'n' is not recognized as
    vQ> a format specifier, so the output from the first two above should be as
    vQ> from the last two above, respectively.  (and likewise in the %S case,
    vQ> discussed and bug-reported earlier.)

I have now fixed these bugs at least;
the more subtle  "%<too_large_n>d" ones are different, and
as I said, I'm convinced that a nice & clean fix for those will
start using snprintf().

    >> 2) Did you have a true use case where  the  8192  limit was an
    >> undesirable limit?

    vQ> how does it matter?  

well, we could increase it, if it did matter.
      {{ you *could* have been more polite here, no?
      	 it *was* after all a serious question that I asked! }}

    vQ>  if you set a limit, be sure to consistently enforce
    vQ> it and warn the user on attempts to exceed it.  or write clearly in the
    vQ> docs that such attempts will cause the output to be silently truncated. 

Sure, I'm not at all disagreeing on that, and if you read this into my
posting, you misunderstand.

Martin

    vQ> examples such as

    vQ> sprintf('%9000d', 1)

    vQ> do not contribute to the reliability of r, and neither to the user's
    vQ> confidence in it.

    vQ> vQ



More information about the R-devel mailing list