[Rd] incorrect output and segfaults from sprintf with %*d (PR#13667)
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Thu Apr 23 15:05:21 CEST 2009
Martin Maechler wrote:
>>>>>> "vQ" == Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
>>>>>> on Thu, 23 Apr 2009 11:49:54 +0200 writes:
>>>>>>
>
> vQ> maechler at stat.math.ethz.ch wrote:
> >>
> vQ> sprintf has a documented limit on strings included in the output using the
> vQ> format '%s'. It appears that there is a limit on the length of strings included
> vQ> with, e.g., the format '%d' beyond which surprising things happen (output
> vQ> modified for conciseness):
> >>
>
> vQ> ... and this limit is *not* documented.
>
> well, it is basically (+ a few bytes ?)
> the same 8192 limit that *is* documented.
>
martin, ?sprintf says:
" There is a limit of 8192 bytes on elements of 'fmt' and also on
strings included by a '%s' conversion specification."
for me, it's clear that the *elements of fmt* cannot be longer than 8192
bytes, and that each single bit included in the output in place of a %s
cannot be longer than 8192. nowhere does it say that strings included
in the output in place of a %d, for example, cannot be longer than
8192. the fact that %s is particularized makes me infer that there is
something specific to %s that does not apply to %d, for example,
otherwise the help would have been formulated differently. (though
given how r help pages are written, nothing seems unlikely.)
and in fact, the limit does not seem to apply in an obvious way in cases
such as sprintf('%*d', 10000, 1), where the output is correct. at the
very least, the documentation leaves the user ignorant as to what will
happen if the limit is exceeded.
> >> my version of 'man 3 sprintf' contains
> >>
> >>
> >>>> BUGS
> >>>> Because sprintf() and vsprintf() assume an arbitrarily
> >>>> long string, callers must be careful not to overflow the
> >>>> actual space; this is often impossible to assure. Note
> >>>> that the length of the strings produced is
> >>>> locale-dependent and difficult to predict. Use
> >>>> snprintf() and vsnprintf() instead (or asprintf() and vasprintf).
>
> vQ> yes, but this is c documentation, not r documentation.
>
> Of course! ... and I *do* apply it to R's C code [sprintf.c]
> and hence am even concurring with you ..
>
>
> vQ> while snprintf would help avoid buffer overflow, it may not be a
> vQ> solution to the issue of confused output.
>
> I think it would / will. We would be able to give warnings and
> errors, by checking the snprintf() return codes.
>
maybe, i can't judge without carefully examining the code for sprintf.c
(which i am rather unwilling to do, having had a look at a sample).
>
> >> More precisely, I see that some windows-only code relies on
> >> snprintf() being available whereas in at least on non-Windows
> >> section, I read /* we cannot assume snprintf here */
> >>
> >> Now such platform dependency issues and corresponding configure
> >> settings I do typically leave to other R-corers with a much
> >> wider overview about platforms and their compilers and C libraries.
> >>
>
> vQ> it looks like src/main/sprintf.c is just buggy, and it's plausible that
> vQ> the bug could be repaired in a platform-independent manner.
>
> definitely.
> In the mean time, I've actually found that what I first said on
> the usability of snprintf() in R's code base was only partly correct.
> There are other parts of R code where we use snprintf() for all
> platforms, hence we rely on its presence (and correct
> implementation!) and so we can and I think should use it in
> place of sprintf() in quite a few places inside R's sprintf.c
>
>
would be interesting to see how this improves sprintf.
> >> BTW,
> >> 1) sprintf("%n %g", 1,1) also seg.faults
> >>
>
> vQ> as do
>
> vQ> sprintf('%n%g', 1, 1)
> vQ> sprintf('%n%')
>
> vQ> etc., while
>
> vQ> sprintf('%q%g', 1, 1)
> vQ> sprintf('%q%')
>
> vQ> work just fine. strange, because per ?sprintf 'n' is not recognized as
> vQ> a format specifier, so the output from the first two above should be as
> vQ> from the last two above, respectively. (and likewise in the %S case,
> vQ> discussed and bug-reported earlier.)
>
> I have now fixed these bugs at least;
>
great, i'm going to torture the fix soon ;)
> the more subtle "%<too_large_n>d" ones are different, and
> as I said, I'm convinced that a nice & clean fix for those will
> start using snprintf().
>
> >> 2) Did you have a true use case where the 8192 limit was an
> >> undesirable limit?
>
> vQ> how does it matter?
>
> well, we could increase it, if it did matter.
> {{ you *could* have been more polite here, no?
>
i don't see how i could be more polite here, i had absolutely no
intention to be impolite and didn't think i were.
i gave a serious answer by means of a serious question. increasing an
arbitrary, poorly documented limit of obscure effect is hardly any
solution. suggesting that a bug is not a bug because some limit is not
likely to be exceeded in practice is not a particularly good idea.
> it *was* after all a serious question that I asked! }}
>
> vQ> if you set a limit, be sure to consistently enforce
> vQ> it and warn the user on attempts to exceed it. or write clearly in the
> vQ> docs that such attempts will cause the output to be silently truncated.
>
> Sure, I'm not at all disagreeing on that, and if you read this into my
> posting, you misunderstand.
>
no, i didn't read that into your posting, i'm just referring to the
state of the 'art' in r.
cheers,
vQ
More information about the R-devel
mailing list