[R] printCoefmat() and zap.ind

Fri Jul 7 18:41:07 CEST 2023

>>>>> Martin Maechler 
>>>>>     on Fri, 7 Jul 2023 18:12:24 +0200 writes:

>>>>> Shu Fai Cheung 
>>>>>     on Thu, 6 Jul 2023 17:14:27 +0800 writes:

    >> Hi All,

    >> I would like to ask two questions about printCoefmat().

    > Good... this function, originally named print.coefmat(),
    > is 25 years old (in R) now:

    > --------------------------------------------------------------------
    > r1902 | maechler | 1998-08-14 19:19:05 +0200 (Fri, 14 Aug 1998) |
    > Changed paths:
    > M R-0-62-patches/CHANGES
    > M R-0-62-patches/src/library/base/R/anova.R
    > M R-0-62-patches/src/library/base/R/glm.R
    > M R-0-62-patches/src/library/base/R/lm.R
    > M R-0-62-patches/src/library/base/R/print.R

    > print.coefmat(.) about ok
    > --------------------------------------------------------------------

    > (yes, at the time, the 'stats' package did not exist yet ..)

    > so it may be a good time to look at it.

    >> First, I found a behavior of printCoefmat() that looks strange to me,
    >> but I am not sure whether this is an intended behavior:

    >> ``` r
    >> set.seed(5689417)
    >> n <- 10000
    >> x1 <- rnorm(n)
    >> x2 <- rnorm(n)
    >> y <- .5 * x1 + .6 * x2 + rnorm(n, -0.0002366, .2)
    >> dat <- data.frame(x1, x2, y)
    >> out <- lm(y ~ x1 + x2, dat)
    >> out_summary <- summary(out)
    >> printCoefmat(out_summary$coefficients)
    >> #>               Estimate Std. Error t value Pr(>|t|)
    >> #> (Intercept) 1.7228e-08 1.9908e-03    0.00        1
    >> #> x1          5.0212e-01 1.9715e-03  254.70   <2e-16 ***
    >> #> x2          6.0016e-01 1.9924e-03  301.23   <2e-16 ***
    >> #> ---
    >> #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    >> printCoefmat(out_summary$coefficients,
    >> zap.ind = 1,
    >> digits = 4)
    >> #>             Estimate Std. Error t value Pr(>|t|)
    >> #> (Intercept) 0.000000   0.001991     0.0        1
    >> #> x1          0.502100   0.001971   254.7   <2e-16 ***
    >> #> x2          0.600200   0.001992   301.2   <2e-16 ***
    >> #> ---
    >> #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    >> ```

    >> With zap.ind = 1, the values in "Estimate" were correctly
    >> zapped using digits = 4. However, by default, "Estimate"
    >> and "Std. Error" are formatted together. Because the
    >> standard errors are small, with digits = 4, zero's were added
    >> to values in "Estimate", resulting in "0.502100" and
    >> "0.600200", which are misleading because, if rounded to
    >> the 6th decimal place, the values to be displayed should
    >> be "0.502122" and "0.600162".

    >> Is this behavior of printCoefmat() intended/normal?

    > Yes, this is "normal" in the sense that zapsmall() is used.
    > I'm not even sure anymore if I was always aware 1998 what exactly the
    > simple zapsmall() function is doing.
    > It does not do what you want here (and actually *typically* want
    > for formatting numbers for display, plotting, etc):
    > You "really want" here and in such situations

    > zapOnlysmall <- function(x, dig) {
    >    x[abs(x) <= 10^-dig] <- 0
    >    x
    > }

    > and I think I'd replace the use of zapsmall() inside
    > printCoefmat() with something like zapOnlysmall() above.

    > This will indeed nicely solve your problem.

well..., now that I tried to change it "globally" in
printCoefmat() and I see how many of the lm() summary or anova()
outputs .. outputs that get slightly changed, and sometimes
quite unfavourably,

I think that the "hard" replacement of zapsmall() by
zapOnlysmall() {above}  is too drastic, ... even though it helps
in your case.

... back to the "drawing board" ...

Martin

    >> Second, how can I use zap without this behavior?
    >> In cases like the one above, I need to use zap such that
    >> the intercept will not be displayed in scientific notation.
    >> Disabling scientific notation cannot achieve the desired
    >> goal.

    >> I tried adding cs.ind = 1:

    > well, from the help page   ?printCoefmat  

    > cs.ind is really about the [ind]ices of [c]oefficient + [s]cale or [s]td.err
    > So, for lm() you should not have to set cs.ind but rather keep
    > it at it's smart default of cs.ind = 1:2 .

    >> ```r
    >> printCoefmat(out_summary$coefficients,
    >> zap.ind = 1,
    >> digits = 4,
    >> cs.ind = 1)
    >> #>             Estimate Std. Error t value Pr(>|t|)
    >> #> (Intercept)   0.0000   0.001991     0.0        1
    >> #> x1            0.5021   0.001971   254.7   <2e-16 ***
    >> #> x2            0.6002   0.001992   301.2   <2e-16 ***
    >> #> ---
    >> #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    >> ```

    >> However, this solution is not ideal because the numbers
    >> of decimal places of "Estimate" and "Std. Error" are
    >> different. How can I get the output like this one?

    >> ```r
    >> #>             Estimate Std. Error t value Pr(>|t|)
    >> #> (Intercept)   0.0000   0.0020     0.0        1
    >> #> x1            0.5021   0.0020   254.7   <2e-16 ***
    >> #> x2            0.6002   0.0020   301.2   <2e-16 ***
    >> ```

    >> Thanks for your attention.

    >> Regards,
    >> Shu Fai Cheung

    > Thank you, Shu Fai,
    > for your careful and thoughtful report!

    > Best regards,
    > Martin

    > ______________________________________________
    > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.