[R] printCoefmat() and zap.ind
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Jul 7 18:41:07 CEST 2023
>>>>> Martin Maechler
>>>>> on Fri, 7 Jul 2023 18:12:24 +0200 writes:
>>>>> Shu Fai Cheung
>>>>> on Thu, 6 Jul 2023 17:14:27 +0800 writes:
>> Hi All,
>> I would like to ask two questions about printCoefmat().
> Good... this function, originally named print.coefmat(),
> is 25 years old (in R) now:
> --------------------------------------------------------------------
> r1902 | maechler | 1998-08-14 19:19:05 +0200 (Fri, 14 Aug 1998) |
> Changed paths:
> M R-0-62-patches/CHANGES
> M R-0-62-patches/src/library/base/R/anova.R
> M R-0-62-patches/src/library/base/R/glm.R
> M R-0-62-patches/src/library/base/R/lm.R
> M R-0-62-patches/src/library/base/R/print.R
> print.coefmat(.) about ok
> --------------------------------------------------------------------
> (yes, at the time, the 'stats' package did not exist yet ..)
> so it may be a good time to look at it.
>> First, I found a behavior of printCoefmat() that looks strange to me,
>> but I am not sure whether this is an intended behavior:
>> ``` r
>> set.seed(5689417)
>> n <- 10000
>> x1 <- rnorm(n)
>> x2 <- rnorm(n)
>> y <- .5 * x1 + .6 * x2 + rnorm(n, -0.0002366, .2)
>> dat <- data.frame(x1, x2, y)
>> out <- lm(y ~ x1 + x2, dat)
>> out_summary <- summary(out)
>> printCoefmat(out_summary$coefficients)
>> #> Estimate Std. Error t value Pr(>|t|)
>> #> (Intercept) 1.7228e-08 1.9908e-03 0.00 1
>> #> x1 5.0212e-01 1.9715e-03 254.70 <2e-16 ***
>> #> x2 6.0016e-01 1.9924e-03 301.23 <2e-16 ***
>> #> ---
>> #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> printCoefmat(out_summary$coefficients,
>> zap.ind = 1,
>> digits = 4)
>> #> Estimate Std. Error t value Pr(>|t|)
>> #> (Intercept) 0.000000 0.001991 0.0 1
>> #> x1 0.502100 0.001971 254.7 <2e-16 ***
>> #> x2 0.600200 0.001992 301.2 <2e-16 ***
>> #> ---
>> #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> ```
>> With zap.ind = 1, the values in "Estimate" were correctly
>> zapped using digits = 4. However, by default, "Estimate"
>> and "Std. Error" are formatted together. Because the
>> standard errors are small, with digits = 4, zero's were added
>> to values in "Estimate", resulting in "0.502100" and
>> "0.600200", which are misleading because, if rounded to
>> the 6th decimal place, the values to be displayed should
>> be "0.502122" and "0.600162".
>> Is this behavior of printCoefmat() intended/normal?
> Yes, this is "normal" in the sense that zapsmall() is used.
> I'm not even sure anymore if I was always aware 1998 what exactly the
> simple zapsmall() function is doing.
> It does not do what you want here (and actually *typically* want
> for formatting numbers for display, plotting, etc):
> You "really want" here and in such situations
> zapOnlysmall <- function(x, dig) {
> x[abs(x) <= 10^-dig] <- 0
> x
> }
> and I think I'd replace the use of zapsmall() inside
> printCoefmat() with something like zapOnlysmall() above.
> This will indeed nicely solve your problem.
well..., now that I tried to change it "globally" in
printCoefmat() and I see how many of the lm() summary or anova()
outputs .. outputs that get slightly changed, and sometimes
quite unfavourably,
I think that the "hard" replacement of zapsmall() by
zapOnlysmall() {above} is too drastic, ... even though it helps
in your case.
... back to the "drawing board" ...
Martin
>> Second, how can I use zap without this behavior?
>> In cases like the one above, I need to use zap such that
>> the intercept will not be displayed in scientific notation.
>> Disabling scientific notation cannot achieve the desired
>> goal.
>> I tried adding cs.ind = 1:
> well, from the help page ?printCoefmat
> cs.ind is really about the [ind]ices of [c]oefficient + [s]cale or [s]td.err
> So, for lm() you should not have to set cs.ind but rather keep
> it at it's smart default of cs.ind = 1:2 .
>> ```r
>> printCoefmat(out_summary$coefficients,
>> zap.ind = 1,
>> digits = 4,
>> cs.ind = 1)
>> #> Estimate Std. Error t value Pr(>|t|)
>> #> (Intercept) 0.0000 0.001991 0.0 1
>> #> x1 0.5021 0.001971 254.7 <2e-16 ***
>> #> x2 0.6002 0.001992 301.2 <2e-16 ***
>> #> ---
>> #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> ```
>> However, this solution is not ideal because the numbers
>> of decimal places of "Estimate" and "Std. Error" are
>> different. How can I get the output like this one?
>> ```r
>> #> Estimate Std. Error t value Pr(>|t|)
>> #> (Intercept) 0.0000 0.0020 0.0 1
>> #> x1 0.5021 0.0020 254.7 <2e-16 ***
>> #> x2 0.6002 0.0020 301.2 <2e-16 ***
>> ```
>> Thanks for your attention.
>> Regards,
>> Shu Fai Cheung
> Thank you, Shu Fai,
> for your careful and thoughtful report!
> Best regards,
> Martin
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list