[R] Formatting numbers with a limited amount of digits consistently

Gabor Grothendieck ggrothendieck at gmail.com
Tue May 31 19:25:01 CEST 2005


On 5/31/05, Marc Schwartz <MSchwartz at mn.rr.com> wrote:
> On Mon, 2005-05-30 at 23:53 -0400, Gabor Grothendieck wrote:
> > On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> > > Gabor Grothendieck wrote:
> > > > On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> > > >
> > > >>Henrik Andersson wrote:
> > > >>
> > > >>>I have tried to get signif, round and format to display numbers like
> > > >>>these consistently in a table, using e.g. signif(x,digits=3)
> > > >>>
> > > >>>17.01
> > > >>>18.15
> > > >>>
> > > >>>I want
> > > >>>
> > > >>>17.0
> > > >>>18.2
> > > >>>
> > > >>>Not
> > > >>>
> > > >>>17
> > > >>>18.2
> > > >>>
> > > >>>
> > > >>>Why is the last digit stripped off in the case when it is zero!
> > > >>
> > > >>signif() changes the value; you don't want that, you want to affect how
> > > >>a number is displayed.  Use format() or formatC() instead, for example
> > > >>
> > > >> > x <- c(17.01, 18.15)
> > > >> > format(x, digits=3)
> > > >>[1] "17.0" "18.1"
> > > >> > noquote(format(x, digits=3))
> > > >>[1] 17.0 18.1
> > > >>
> > > >
> > > >
> > > > That works in the above context but I don't think it works generally:
> > > >
> > > > R> f <- head(faithful)
> > > > R> f
> > > >   eruptions waiting
> > > > 1     3.600      79
> > > > 2     1.800      54
> > > > 3     3.333      74
> > > > 4     2.283      62
> > > > 5     4.533      85
> > > > 6     2.883      55
> > > >
> > > > R> format(f, digits = 3)
> > > >   eruptions waiting
> > > > 1      3.60      79
> > > > 2      1.80      54
> > > > 3      3.33      74
> > > > 4      2.28      62
> > > > 5      4.53      85
> > > > 6      2.88      55
> > > >
> > > > R> # this works in this case
> > > > R> noquote(prettyNum(round(f,1), nsmall = 1))
> > > >      eruptions waiting
> > > > [1,] 3.6       79.0
> > > > [2,] 1.8       54.0
> > > > [3,] 3.3       74.0
> > > > [4,] 2.3       62.0
> > > > [5,] 4.5       85.0
> > > > [6,] 2.9       55.0
> > > >
> > > > and even that does not work in the desired way (which presumably
> > > > is not to use exponent format) if you have some
> > > > large enough numbers like 1e6 which it will display using
> > > > the e notation rather than using ordinary notation.
> > >
> > > formatC with format="f" seems to work for me, though it assumes you're
> > > specifying decimal places rather than significant digits.  It also wants
> > > a vector of numbers as input, not a dataframe.  So the following gives
> > > pretty flexible control over what a table will look like:
> > >
> > >  > data.frame(eruptions = formatC(f$eruptions, digits=2, format='f'),
> > > +            waiting = formatC(f$waiting, digits=1, format='f'))
> > >    eruptions waiting
> > > 1 1000000.11    79.0
> > > 2       1.80    54.0
> > > 3       3.33    74.0
> > > 4       2.28    62.0
> > > 5       4.53    85.0
> > > 6       2.88    55.0
> > >
> > > >
> > > > I have struggled with this myself and have generally been able
> > > > to come up with something for specific instances but I have generally
> > > > found it a pain to do a simple thing like format a table exactly as I want
> > > > without undue effort.  Maybe someone else has figured this out.
> > >
> > > I think that formatting tables properly requires some thought, and R is
> > > no good at thinking.  You can easily recognize a badly formatted table,
> > > but it's very hard to write down rules that work in general
> > > circumstances.  It's also a matter of taste, so if I managed to write a
> > > function that matched my taste, you would find you wanted to make changes.
> > >
> > > It's sort of like expecting plot(x, y) to always come up with the best
> > > possible plot of y versus x.  It's just not a reasonable expectation.
> > > It's better to provide tools (like abline() for plots or formatC() for
> > > tables) that allow you to tailor a plot or table to your particular needs.
> > >
> >
> > Thanks.  That seems to be the idiom I was missing.  One thing that would
> > be nice would be if formatC could handle data frames.
> 
> 
> Guys, perhaps I am missing something here, but there seems to be some
> confusion as to how the numbers are stored internally, versus how the
> output is displayed and the meaning of "significant digits", which is
> what I believe Henrik's original query was about.
> 
> By default, R's printed output uses the settings from options("digits")
> and options("scipen") to define output based upon the number of
> significant digits, which is of course not the same as the number of
> decimal places. Hence the variance in the output that Henrik gets and
> why the trailing zero is dropped.
> 
> The use of signif() does not help here because it is still based upon
> the number of significant digits, where the trailing zero still gets
> dropped.
> 
> The use of the above are "inexact" when it comes to creating formatted
> output for a table with a consistent number of decimal places to align
> columns of numbers.
> 
> format() is still problematic here because it too uses the number of
> significant digits, defaulting to options("digits").

Good point.  It would be nice if format had an argument that allowed
one to specify the number of digits after the decimal place.  I think
this would reduce frustrations in quickly formatting data frames.




More information about the R-help mailing list