[Rd] as.data.frame() methods for model objects

Fri Jan 17 21:12:10 CET 2025

Thank you very much Martin.

Below is a patch implementing that.

Two newbie questions:
- should I add row.names = NULL, optional = FALSE to match the arguments of the generic? (this is not the case for e.g. as.data.frame.table but I thought it was needed: https://cloud.r-project.org/doc/manuals/r-devel/R-exts.html#Generic-functions-and-methods)
- shouldn't we use match.fun(transFUN)?

diff --git a/src/library/stats/R/lm.R b/src/library/stats/R/lm.R
index 13a458797b..2ce6b16f6e 100644
--- a/src/library/stats/R/lm.R
+++ b/src/library/stats/R/lm.R
@@ -982,3 +982,18 @@ labels.lm <- function(object, ...)
     asgn <- object$assign[qr.lm(object)$pivot[1L:object$rank]]
     tl[unique(asgn)]
 }
+
+as.data.frame.lm <- function(x, ..., level = 0.95, transFUN = NULL)
+{
+  cf <- x |> summary() |> coef()
+  ci <- confint(x, level = level)
+  if(!is.null(transFUN)) {
+    stopifnot(is.function(transFUN))
+    cf[, "Estimate"] <- transFUN(cf[, "Estimate"])
+    ci <- transFUN(ci)
+  }
+  df <- data.frame(row.names(cf), cf, ci, row.names = NULL)
+  names(df) <- c("term", "estimate", "std.error", "statistic", "p.value",
+                 "conf.low", "conf.high")
+  df
+}
diff --git a/src/library/stats/man/lm.Rd b/src/library/stats/man/lm.Rd
index ff05afabff..b54373dff4 100644
--- a/src/library/stats/man/lm.Rd
+++ b/src/library/stats/man/lm.Rd
@@ -21,6 +21,8 @@ lm(formula, data, subset, weights, na.action,
    singular.ok = TRUE, contrasts = NULL, offset, \dots)
 
 \S3method{print}{lm}(x, digits = max(3L, getOption("digits") - 3L), \dots)
+
+\S3method{as.data.frame}{lm}(x, ..., level = 0.95, transFUN = NULL)
 }
 \arguments{
   \item{formula}{an object of class \code{"\link{formula}"} (or one that
@@ -81,6 +83,10 @@ lm(formula, data, subset, weights, na.action,
   \item{digits}{the number of \emph{significant} digits to be
     passed to \code{\link{format}(\link{coef}(x), .)} when
     \I{\code{\link{print}()}ing}.}
+  %% as.data.frame.lm():
+  \item{level}{the confidence level required.}
+  \item{transFUN}{a function to transform \code{estimate}, \code{conf.low} and
+    \code{conf.high}.}
 }
 \details{
   Models for \code{lm} are specified symbolically.  A typical model has
@@ -168,6 +174,10 @@ lm(formula, data, subset, weights, na.action,
   \code{effects} and (unless not requested) \code{qr} relating to the linear
   fit, for use by extractor functions such as \code{summary} and
   \code{\link{effects}}.
+
+  \code{as.data.frame} returns a data frame with statistics as provided by
+  \code{coef(summary(.))} and confidence intervals for model
+  estimates.
 }
 \section{Using time series}{
   Considerable care is needed when using \code{lm} with time series.




De : Martin Maechler [mailto:maechler using stat.math.ethz.ch] 
Envoyé : vendredi 17 janvier 2025 17:04
À : SOEIRO Thomas
Cc : r-devel using r-project.org
Objet : Re: [Rd] as.data.frame() methods for model objects


>>>>> SOEIRO Thomas via R-devel 
>>>>>     on Fri, 17 Jan 2025 14:19:31 +0000 writes:

> Following Duncan Murdoch's off-list comments (thanks again!), here is a more complete/flexible version:
> 
> as.data.frame.lm <- function(x, ..., level = 0.95, exp = FALSE) {
>   cf <- x |> summary() |> stats::coef()
>   ci <- stats::confint(x, level = level)
>   if (exp) {
>     cf[, "Estimate"] <- exp(cf[, "Estimate"])
>     ci <- exp(ci)
>   }
>   df <- data.frame(row.names(cf), cf, ci, row.names = NULL)
>   names(df) <- c("term", "estimate", "std.error", "statistic", "p.value", "conf.low", "conf.high")
>   df
> }

Indeed, using level is much better already.

Instead of the  exp = FALSE ,
I'd use    transFUN = NULL
and then

    if(!is.null(transFUN)) {
       stopifnot(is.function(transFUN))
       cf[, "Estimate"] <- transFUN(cf[, "Estimate"])
       ci <- transFUN(ci)
    }

Noting that I'd want "inverse-logit" (*) in some cases, but also
different things for different link functions, hence just
exp = T/F  is not enough.

Martin

--
*) "inverse-logit"  is simply R's   plogis()  function;  quite a
 few people have been re-inventing it, also in their packages ...



> > lm(breaks ~ wool + tension, warpbreaks) |> as.data.frame()
>          term   estimate std.error statistic      p.value  conf.low  conf.high
> 1 (Intercept)  39.277778  3.161783 12.422667 6.681866e-17  32.92715 45.6284061
> 2       woolB  -5.777778  3.161783 -1.827380 7.361367e-02 -12.12841  0.5728505
> 3    tensionM -10.000000  3.872378 -2.582393 1.278683e-02 -17.77790 -2.2221006
> 4    tensionH -14.722222  3.872378 -3.801856 3.913842e-04 -22.50012 -6.9443228
> 
> > glm(breaks < 20 ~ wool + tension, data = warpbreaks) |> as.data.frame(exp = TRUE)
> Waiting for profiling to be done...
>          term estimate std.error statistic    p.value  conf.low conf.high
> 1 (Intercept) 1.076887 0.1226144 0.6041221 0.54849393 0.8468381  1.369429
> 2       woolB 1.076887 0.1226144 0.6041221 0.54849393 0.8468381  1.369429
> 3    tensionM 1.248849 0.1501714 1.4797909 0.14520270 0.9304302  1.676239
> 4    tensionH 1.395612 0.1501714 2.2196863 0.03100435 1.0397735  1.873229
> 
> Thank you.
> 
> Best regards,
> Thomas
> 
> 
> 
> -----Message d'origine-----
> De : SOEIRO Thomas 
> Envoyé : jeudi 16 janvier 2025 14:36
> À : r-devel using r-project.org
> Objet : as.data.frame() methods for model objects
> 
> Hello all,
> 
> Would there be any interest for adding as.data.frame() methods for model objects?
> Of course there is packages (e.g. broom), but I think providing methods would be more discoverable (and the patch would be small).
> It is really useful for exporting model results or for plotting.
> 
> e.g.:
> 
> as.data.frame.lm <- function(x) { # could get other arguments, e.g. exp = TRUE/FALSE to exponentiate estimate, conf.low, conf.high
>   cf <- x |> summary() |> stats::coef()
>   ci <- stats::confint(x)
>   data.frame(
>     term = row.names(cf),
>     estimate = cf[, "Estimate"],
>     p.value = cf[, 4], # magic number because name changes between lm() and glm(*, family = *)
>     conf.low = ci[, "2.5 %"],
>     conf.high = ci[, "97.5 %"],
>     row.names = NULL
>   )
> }
> 
> > lm(breaks ~ wool + tension, warpbreaks) |> as.data.frame()
>          term   estimate      p.value  conf.low  conf.high
> 1 (Intercept)  39.277778 6.681866e-17  32.92715 45.6284061
> 2       woolB  -5.777778 7.361367e-02 -12.12841  0.5728505
> 3    tensionM -10.000000 1.278683e-02 -17.77790 -2.2221006
> 4    tensionH -14.722222 3.913842e-04 -22.50012 -6.9443228
> 
> > glm(breaks < 20 ~ wool + tension, data = warpbreaks) |> as.data.frame()
> Waiting for profiling to be done...
>          term   estimate    p.value    conf.low conf.high
> 1 (Intercept) 0.07407407 0.54849393 -0.16624575 0.3143939
> 2       woolB 0.07407407 0.54849393 -0.16624575 0.3143939
> 3    tensionM 0.22222222 0.14520270 -0.07210825 0.5165527
> 4    tensionH 0.33333333 0.03100435  0.03900286 0.6276638
> 
> Thank you.
> 
> Best regards,
> Thomas