[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)

Thu Apr 23 15:05:21 CEST 2009

Martin Maechler wrote:
>>>>>> "vQ" == Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
>>>>>>     on Thu, 23 Apr 2009 11:49:54 +0200 writes:
>>>>>>             
>
>     vQ> maechler at stat.math.ethz.ch wrote:
>     >> 
>     vQ> sprintf has a documented limit on strings included in the output using the
>     vQ> format '%s'.  It appears that there is a limit on the length of strings included
>     vQ> with, e.g., the format '%d' beyond which surprising things happen (output
>     vQ> modified for conciseness):
>     >> 
>
>     vQ> ... and this limit is *not* documented.
>
> well, it is basically (+ a few bytes ?)
> the same  8192  limit that *is* documented.
>   

martin, ?sprintf says:

" There is a limit of 8192 bytes on elements of 'fmt' and also on
     strings included by a '%s' conversion specification."

for me, it's clear that the *elements of fmt* cannot be longer than 8192
bytes, and that each single bit included in the output in place of a %s
cannot be longer than 8192.  nowhere does it say that strings included
in the output in place of a %d, for example, cannot be longer than
8192.  the fact that %s is particularized makes me infer that there is
something specific to %s that does not apply to %d, for example,
otherwise the help would have been formulated differently.  (though
given how r help pages are written, nothing seems unlikely.)

and in fact, the limit does not seem to apply in an obvious way in cases
such as sprintf('%*d', 10000, 1), where the output is correct.  at the
very least, the documentation leaves the user ignorant as to what will
happen if the limit is exceeded.

>     >> my version of  'man 3 sprintf' contains
>     >> 
>     >> 
>     >>>> BUGS
>     >>>> Because sprintf() and vsprintf() assume an arbitrarily
>     >>>> long string, callers must be careful not to overflow the
>     >>>> actual space; this is often impossible to assure. Note
>     >>>> that the length of the strings produced is
>     >>>> locale-dependent and difficult to predict.  Use
>     >>>> snprintf() and vsnprintf() instead (or asprintf() and vasprintf).
>
>     vQ> yes, but this is c documentation, not r documentation.
>
> Of course! ...  and I  *do*  apply it to R's C code [sprintf.c]
> and hence am even concurring with you .. 
>
>
>     vQ> while snprintf would help avoid buffer overflow, it may not be a
>     vQ> solution to the issue of confused output.
>
> I think it would / will.  We would be able to give warnings and
> errors, by checking the  snprintf()  return codes.
>   

maybe, i can't judge without carefully examining the code for sprintf.c
(which i am rather unwilling to do, having had a look at a sample).

>
>     >> More precisely, I see that some windows-only code relies on
>     >> snprintf() being available  whereas in at least on non-Windows
>     >> section, I read   /* we cannot assume snprintf here */
>     >> 
>     >> Now such platform dependency issues and corresponding configure
>     >> settings I do typically leave to other R-corers with a much
>     >> wider overview about platforms and their compilers and C libraries.
>     >> 
>
>     vQ> it looks like src/main/sprintf.c is just buggy, and it's plausible that
>     vQ> the bug could be repaired in a platform-independent manner.
>
> definitely.
> In the mean time, I've actually found that what I first said on
> the usability of snprintf() in R's code base was only partly correct.
> There are other parts of R code where we use  snprintf() for all
> platforms, hence we rely on its presence (and correct
> implementation!) and so we can and I think should use it in
> place of sprintf() in quite a few places inside R's sprintf.c
>
>   

would be interesting to see how this improves sprintf.

>     >> BTW,  
>     >> 1) sprintf("%n %g", 1,1)   also seg.faults
>     >> 
>
>     vQ> as do
>
>     vQ> sprintf('%n%g', 1, 1)
>     vQ> sprintf('%n%')
>
>     vQ> etc., while
>
>     vQ> sprintf('%q%g', 1, 1)
>     vQ> sprintf('%q%')
>   
>     vQ> work just fine.  strange, because per ?sprintf 'n' is not recognized as
>     vQ> a format specifier, so the output from the first two above should be as
>     vQ> from the last two above, respectively.  (and likewise in the %S case,
>     vQ> discussed and bug-reported earlier.)
>
> I have now fixed these bugs at least;
>   

great, i'm going to torture the fix soon ;)

> the more subtle  "%<too_large_n>d" ones are different, and
> as I said, I'm convinced that a nice & clean fix for those will
> start using snprintf().
>
>     >> 2) Did you have a true use case where  the  8192  limit was an
>     >> undesirable limit?
>
>     vQ> how does it matter?  
>
> well, we could increase it, if it did matter.
>       {{ you *could* have been more polite here, no?
>   

i don't see how i could be more polite here, i had absolutely no
intention to be impolite and didn't think i were. 

i gave a serious answer by means of a serious question.  increasing an
arbitrary, poorly documented limit of obscure effect is hardly any
solution.  suggesting that a bug is not a bug because some limit is not
likely to be exceeded in practice is not a particularly good idea.

>       	 it *was* after all a serious question that I asked! }}
>
>     vQ>  if you set a limit, be sure to consistently enforce
>     vQ> it and warn the user on attempts to exceed it.  or write clearly in the
>     vQ> docs that such attempts will cause the output to be silently truncated. 
>
> Sure, I'm not at all disagreeing on that, and if you read this into my
> posting, you misunderstand.
>   

no, i didn't read that into your posting, i'm just referring to the
state of the 'art' in r.

cheers,
vQ