[Rd] wishlist -- Fix for major format.pval limitation (PR#9574)

Wed Mar 21 18:23:15 CET 2007

Martin Maechler wrote:
>>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck at gmail.com>
>>>>>>     on Tue, 20 Mar 2007 22:10:27 -0400 writes:
> 
>     Gabor> On 3/20/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>     >> On 3/20/2007 1:40 PM, Gabor Grothendieck wrote:
>     >> > On 3/20/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>     >> >> On 3/20/2007 12:44 PM, Gabor Grothendieck wrote:
>     >> >> > On 3/20/07, murdoch at stats.uwo.ca <murdoch at stats.uwo.ca> wrote:
>     >> >> >> On 3/20/2007 11:19 AM, charles.dupont at vanderbilt.edu wrote:
>     >> >> >> > Full_Name: Charles Dupont
>     >> >> >> > Version: 2.4.1
>     >> >> >> > OS: linux 2.6.18
>     >> >> >> > Submission from: (NULL) (160.129.129.136)
>     >> >> >> >
>     >> >> >> >
>     >> >> >> > 'format.pval' has a major limitation in its implementation. For example
>     >> >> >> > suppose a person had a vector like 'a' and the error being ±0.001.
>     >> >> >> >
>     >> >> >> >     > a <- c(0.1, 0.3, 0.4, 0.5, 0.3, 0.0001)
>     >> >> >> >     > format.pval(a, eps=0.01)
>     >> >> >> >
>     >> >> >> > If that person wants to have the 'format.pval' output with 2 digits always
>     >> >> >> > showing (like passing nsmall=2 to 'format'). That output would look like
>     >> >> >> > this.
>     >> >> >> >
>     >> >> >> >     [1] "0.10"   "0.30"   "0.40"   "0.50"   "0.30"   "<0.01"
>     >> >> >> >
>     >> >> >> > That output is currently impossible because format.pval can only
>     >> >> >> > produce output like this.
>     >> >> >> >
>     >> >> >> >     [1] "0.1"    "0.3"    "0.4"    "0.5"    "0.3"    "<0.01"
>     >> >> >> >
>     >> >> >> >
>     >> >> >> > ---------------------------------------------------------------
>     >> >> >> > a <- c(0.1, 0.3, 0.4, 0.5, 0.3, 0.0001)
>     >> >> >> > format.pval(a, eps=0.01)
>     >> >> >>
>     >> >> >> But there's a very easy workaround:
>     >> >> >>
>     >> >> >> format.pval(c(0.12, a), eps=0.01)[-1]
>     >> >> >>
>     >> >> >> gives you what you want (because the 0.12 forces two decimal place
>     >> >> >> display on all values, and then the [-1] removes it).
>     >> >> >>
>     >> >> >
>     >> >> > Clever, but the problem would be that summary.lm, etc. call format.pval so the
>     >> >> > user does not have a chance to do that.
>     >> >>
>     >> >> I don't see how this is relevant.  summary.lm doesn't let you pass a new
>     >> >> eps value either.  Adding an "nsmall=2" argument to format.pval wouldn't
>     >> >> help with the display in summary.lm.
>     >> >>
>     >> >> I suppose we could track down every use of format.pval in every function
>     >> >> in every package and add nsmall and eps as arguments to each of them,
>     >> >> but that's just ridiculous.  People should accept the fact that R
>     >> >> doesn't produce publication quality text, it just provides you with ways
>     >> >> to produce that yourself.
>     >> >>
>     >> >> Duncan Murdoch
>     >> >>
>     >> >
>     >> > You are right in terms of my example which was not applicable but I
>     >> > think in general that format.pval is used from within other routines rather than
>     >> > directly by the user so the user may not have a chance to massage it
>     >> > directly.
>     >> 
>     >> Right, but this means that it is more or less useless to change the
>     >> argument list for format.pvals in the way Charles suggested, because all
>     >> of the existing uses of it would ignore the new parameters.
>     >> 
>     >> It would not be so difficult to change the behaviour of format.pvals so
>     >> that for example "digits=2" implied the equivalent of "nsmall=2", but I
>     >> don't think that's a universally desirable change.
>     >> 
>     >> The difficulty here is that different people have different tastes for
>     >> presentation-quality text.  Not everyone would agree that the version
>     >> with trailing zeros is preferable to the one without.  R should be
>     >> flexible enough to allow people to customize their displays, but not
>     >> necessarily by having every print method flexible enough to satisfy
>     >> every user:  sometimes users need to construct their own output formats.
>     >> 
>     >> Duncan Murdoch
> 
>     Gabor> One possibility would be to add args to format.pval whose defaults
>     Gabor> can be set through options.  Not beautiful but it would give the user
>     Gabor> who really needed it a way to do it.
> 
> Yes indeed, I had had the same thought (very early in this
> thread).  This doesn't mean that I wouldn't agree with Duncan's
> statement above anyway.
> 
> Whereas I have strong opinion on *not* allowing options() to
> influence too many things [it's entirely contrary to the
> principle of functional programming], 
> options() have always been used to tweak print()ing; so they
> could be used here as well.
> As original author of format.pval(), I'm happy to accept patches
> --- if they are done well and also patch 
>     src/library/base/man/format.pval.Rd and ..../man/options.Rd 
> 
> Martin
> 

I have included a patch for 'format.pval' in which I have implemented 
what I think of as the optimal solution for the problem.  Which is to 
add a '...' arg to format.pval and pass it to the 'format' function calls.

Patch was created for the r-release branch but will successfully apply 
to the r-devel branch.

Index: src/library/base/R/format.R
===================================================================

--- src/library/base/R/format.R (revision 40867)
+++ src/library/base/R/format.R (working copy)
@@ -43,7 +43,7 @@
  }

  format.pval <- function(pv, digits = max(1, getOption("digits")-2),
-                       eps = .Machine$double.eps, na.form = "NA")
+                       eps = .Machine$double.eps, na.form = "NA", ...)
  {
      ## Format  P values; auxiliary for print.summary.[g]lm(.)

@@ -55,8 +55,8 @@
         ## be smart -- differ for fixp. and expon. display:
         expo <- floor(log10(ifelse(pv > 0, pv, 1e-50)))
         fixp <- expo >= -3 | (expo == -4 & digits>1)
-       if(any( fixp)) rr[ fixp] <- format(pv[ fixp], dig=digits)
-       if(any(!fixp)) rr[!fixp] <- format(pv[!fixp], dig=digits)
+       if(any( fixp)) rr[ fixp] <- format(pv[ fixp], dig=digits, ...)
+       if(any(!fixp)) rr[!fixp] <- format(pv[!fixp], dig=digits, ...)
         r[!is0]<- rr
      }
      if(any(is0)) {
@@ -67,7 +67,7 @@
                 digits <- max(1, nc - 7)
             sep <- if(digits==1 && nc <= 6) "" else " "
         } else sep <- if(digits==1) "" else " "
-       r[is0] <- paste("<", format(eps, digits=digits), sep = sep)
+       r[is0] <- paste("<", format(eps, digits=digits, ...), sep = sep)
      }
      if(has.na) { ## rarely
         rok <- r
Index: src/library/base/man/format.pval.Rd
===================================================================
--- src/library/base/man/format.pval.Rd (revision 40867)
+++ src/library/base/man/format.pval.Rd (working copy)
@@ -6,13 +6,14 @@
  \alias{format.pval}
  \usage{
  format.pval(pv, digits = max(1, getOption("digits") - 2),
-            eps = .Machine$double.eps, na.form = "NA")
+            eps = .Machine$double.eps, na.form = "NA", \dots)
  }
  \arguments{
    \item{pv}{a numeric vector.}
    \item{digits}{how many significant digits are to be used.}
    \item{eps}{a numerical tolerance: see Details.}
    \item{na.form}{character representation of \code{NA}s.}
+  \item{\dots}{arguments passed to the \code{\link{format}} function.}
  }
  \value{
    A character vector.


-- 
Charles Dupont	Computer System Analyst		School of Medicine
		Department of Biostatistics	Vanderbilt University