[R] Formatting numbers with a limited amount of digits consistently
Marc Schwartz
MSchwartz at mn.rr.com
Tue May 31 16:30:05 CEST 2005
On Mon, 2005-05-30 at 23:53 -0400, Gabor Grothendieck wrote:
> On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> > Gabor Grothendieck wrote:
> > > On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> > >
> > >>Henrik Andersson wrote:
> > >>
> > >>>I have tried to get signif, round and format to display numbers like
> > >>>these consistently in a table, using e.g. signif(x,digits=3)
> > >>>
> > >>>17.01
> > >>>18.15
> > >>>
> > >>>I want
> > >>>
> > >>>17.0
> > >>>18.2
> > >>>
> > >>>Not
> > >>>
> > >>>17
> > >>>18.2
> > >>>
> > >>>
> > >>>Why is the last digit stripped off in the case when it is zero!
> > >>
> > >>signif() changes the value; you don't want that, you want to affect how
> > >>a number is displayed. Use format() or formatC() instead, for example
> > >>
> > >> > x <- c(17.01, 18.15)
> > >> > format(x, digits=3)
> > >>[1] "17.0" "18.1"
> > >> > noquote(format(x, digits=3))
> > >>[1] 17.0 18.1
> > >>
> > >
> > >
> > > That works in the above context but I don't think it works generally:
> > >
> > > R> f <- head(faithful)
> > > R> f
> > > eruptions waiting
> > > 1 3.600 79
> > > 2 1.800 54
> > > 3 3.333 74
> > > 4 2.283 62
> > > 5 4.533 85
> > > 6 2.883 55
> > >
> > > R> format(f, digits = 3)
> > > eruptions waiting
> > > 1 3.60 79
> > > 2 1.80 54
> > > 3 3.33 74
> > > 4 2.28 62
> > > 5 4.53 85
> > > 6 2.88 55
> > >
> > > R> # this works in this case
> > > R> noquote(prettyNum(round(f,1), nsmall = 1))
> > > eruptions waiting
> > > [1,] 3.6 79.0
> > > [2,] 1.8 54.0
> > > [3,] 3.3 74.0
> > > [4,] 2.3 62.0
> > > [5,] 4.5 85.0
> > > [6,] 2.9 55.0
> > >
> > > and even that does not work in the desired way (which presumably
> > > is not to use exponent format) if you have some
> > > large enough numbers like 1e6 which it will display using
> > > the e notation rather than using ordinary notation.
> >
> > formatC with format="f" seems to work for me, though it assumes you're
> > specifying decimal places rather than significant digits. It also wants
> > a vector of numbers as input, not a dataframe. So the following gives
> > pretty flexible control over what a table will look like:
> >
> > > data.frame(eruptions = formatC(f$eruptions, digits=2, format='f'),
> > + waiting = formatC(f$waiting, digits=1, format='f'))
> > eruptions waiting
> > 1 1000000.11 79.0
> > 2 1.80 54.0
> > 3 3.33 74.0
> > 4 2.28 62.0
> > 5 4.53 85.0
> > 6 2.88 55.0
> >
> > >
> > > I have struggled with this myself and have generally been able
> > > to come up with something for specific instances but I have generally
> > > found it a pain to do a simple thing like format a table exactly as I want
> > > without undue effort. Maybe someone else has figured this out.
> >
> > I think that formatting tables properly requires some thought, and R is
> > no good at thinking. You can easily recognize a badly formatted table,
> > but it's very hard to write down rules that work in general
> > circumstances. It's also a matter of taste, so if I managed to write a
> > function that matched my taste, you would find you wanted to make changes.
> >
> > It's sort of like expecting plot(x, y) to always come up with the best
> > possible plot of y versus x. It's just not a reasonable expectation.
> > It's better to provide tools (like abline() for plots or formatC() for
> > tables) that allow you to tailor a plot or table to your particular needs.
> >
>
> Thanks. That seems to be the idiom I was missing. One thing that would
> be nice would be if formatC could handle data frames.
Guys, perhaps I am missing something here, but there seems to be some
confusion as to how the numbers are stored internally, versus how the
output is displayed and the meaning of "significant digits", which is
what I believe Henrik's original query was about.
By default, R's printed output uses the settings from options("digits")
and options("scipen") to define output based upon the number of
significant digits, which is of course not the same as the number of
decimal places. Hence the variance in the output that Henrik gets and
why the trailing zero is dropped.
The use of signif() does not help here because it is still based upon
the number of significant digits, where the trailing zero still gets
dropped.
The use of the above are "inexact" when it comes to creating formatted
output for a table with a consistent number of decimal places to align
columns of numbers.
format() is still problematic here because it too uses the number of
significant digits, defaulting to options("digits").
Using formatC() or sprintf() in conjunction with cat() is usually the
best way to gain control over how numeric output is formatted,
especially in a nicely aligned table. This is what I use in CrossTable
(), where I want decimal aligned columns for numbers in the tabular
output, along with fixed width columns for textual output (ie. labels,
etc.).
Briefly, along the lines of Gabor's example on the output using the
faithful dataset above, one could use something like:
> f <- head(faithful)
> noquote(apply(f, 2, function(x) formatC(x, format = "f", digits = 1)))
eruptions waiting
1 3.6 79.0
2 1.8 54.0
3 3.3 74.0
4 2.3 62.0
5 4.5 85.0
6 2.9 55.0
which only affects how the data is printed, not the data itself. It can
work fine for a 2D object that has all numeric columns.
Note however that the numeric columns are left-aligned, not right-
aligned, as in the default print method, since the output of the above
function is a character matrix, rather than a data.frame with numeric
columns. Hence, note:
> f
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
Thus, for greater control, one should use sprintf() and cat():
out.lines <- sprintf("%15s %15s\n", colnames(f)[1], colnames(f)[2])
for (i in 1:nrow(f))
{
out.lines <- c(out.lines,
sprintf("%14.1f %14.1f\n", f[i, 1], f[i, 2]))
}
> cat(out.lines)
eruptions waiting
3.6 79.0
1.8 54.0
3.3 74.0
2.3 62.0
4.5 85.0
2.9 55.0
In the above case, one can specify the column widths for the column
labels and the row values. Of course, the above could be extended to
become a generic function for data frames with multiple data types, with
arguments enabling the specification of column widths, number of decimal
places, etc. One might even want more than one specification for the
number of decimal places depending upon the nature of the columns on the
object to be printed, so vectors could be used for these arguments.
I'll leave that for further exercise.
Final note to Henrik: Note that the IEEE 754 rounding standard as
implemented in R results in:
> round(18.15, 1)
[1] 18.1
> formatC(18.15, format = "f", digits = 1)
[1] "18.1"
> sprintf("%5.1f", 18.15)
[1] " 18.1"
This is because the rounding method implemented is the "go to the even
digit" approach. Thus, you don't get 18.2.
See ?round for more information.
HTH,
Marc Schwartz
More information about the R-help
mailing list