[Rd] vcov and survival
Fox, John
jfox at mcmaster.ca
Thu Nov 2 22:23:17 CET 2017
Dear Martin,
Thank you for taking care of this.
Best,
John
> -----Original Message-----
> From: Martin Maechler [mailto:maechler at stat.math.ethz.ch]
> Sent: Thursday, November 2, 2017 4:59 PM
> To: Fox, John <jfox at mcmaster.ca>
> Cc: Martin Maechler <maechler at stat.math.ethz.ch>; Therneau, Terry M.,
> Ph.D. <therneau at mayo.edu>; r-devel at r-project.org
> Subject: RE: [Rd] vcov and survival
>
> >>>>> Fox, John <jfox at mcmaster.ca>
> >>>>> on Thu, 14 Sep 2017 13:46:44 +0000 writes:
>
> > Dear Martin, I made three points which likely got lost
> > because of the way I presented them:
>
> > (1) Singularity is an unusual situation and should be made
> > more prominent. It typically reflects a problem with the
> > data or the specification of the model. That's not to say
> > that it *never* makes sense to allow singular fits (as in
> > the situations you mentions).
>
> > I'd favour setting singular.ok=FALSE as the default, but
> > in the absence of that a warning or at least a note. A
> > compromise would be to have a singular.ok option() that
> > would be FALSE out of the box.
>
> > Any changes would have to be made very carefully so as not
> > to create chaos.
>
> I for one, am too reluctant to want to change the default there.
>
> > That goes for the points below as well.
>
> > (2) coef() and vcov() behave inconsistently, which can be
> > problematic because one often uses them together in code.
>
> indeed; and I had agreed on that.
> As of today, in R-devel only they now behave compatibly.
> NEWS entry
>
> • The “default” ("lm" etc) methods of vcov() have gained new
> optional argument complete = TRUE which makes the vcov() methods
> more consistent with the coef() methods in the case of singular
> designs. The former behavior is now achieved by vcov(*,
> complete=FALSE).
>
>
> > (3) As you noticed in your second message, lm() has a
> > singular.ok argument and glm() doesn't.
>
> and that has been amended even earlier (a bit more than a month
> ago) in R-devel svn rev 73380 with NEWS entry
>
> • glm() and glm.fit get the same singular.ok=TRUE argument that
> lm() has had forever. As a consequence, in glm(*, method =
> <your_own>), user specified methods need to accept a singular.ok
> argument as well.
>
> > I'll take a look at the code for glm() with an eye towards
> > creating a patch, but I'm a bit reluctant to mess with the
> > code for something as important as glm().
>
> and as a matter of fact you did send me +- the R code part of that change.
>
> My current plan is to also add the 'complete = TRUE' option to the "basic"
> coef() methods, such that you also have consistent coef(*, complete=FALSE)
> and vcov(*, complete=FALSE) behaviors.
>
> Thank you and Terry (and others?) for bringing up the issues and discussing
> them thoroughly!
>
> Best,
> Martin.
>
>
> > Best, John
>
>
>
> >> -----Original Message----- From: Martin Maechler
> >> [mailto:maechler at stat.math.ethz.ch] Sent: Thursday,
> >> September 14, 2017 4:23 AM To: Martin Maechler
> >> <maechler at stat.math.ethz.ch> Cc: Fox, John
> >> <jfox at mcmaster.ca>; Therneau, Terry M., Ph.D.
> >> <therneau at mayo.edu>; r-devel at r-project.org Subject: Re:
> >> [Rd] vcov and survival
> >>
> >> >>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>>
> >> on Thu, 14 Sep 2017 10:13:02 +0200 writes:
> >>
> >> >>>>> Fox, John <jfox at mcmaster.ca> >>>>> on Wed, 13 Sep
> >> 2017 22:45:07 +0000 writes:
> >>
> >> >> Dear Terry, >> Even the behaviour of lm() and glm()
> >> isn't entirely consistent. In both cases, singularity
> >> results in NA coefficients by default, and these are
> >> reported in the model summary and coefficient vector, but
> >> not in the coefficient covariance matrix:
> >>
> >> >> ----------------
> >>
> >> >>> mod.lm <- lm(Employed ~ GNP + Population + I(GNP +
> >> Population), >> + data=longley) >>> summary(mod.lm)
> >>
> >> >> Call: >> lm(formula = Employed ~ GNP + Population +
> >> I(GNP + Population), >> data = longley)
> >>
> >> >> Residuals: >> Min 1Q Median 3Q Max >> -0.80899
> >> -0.33282 -0.02329 0.25895 1.08800
> >>
> >> >> Coefficients: (1 not defined because of singularities)
> >> >> Estimate Std. Error t value Pr(>|t|) >> (Intercept)
> >> 88.93880 13.78503 6.452 2.16e-05 *** >> GNP 0.06317
> >> 0.01065 5.933 4.96e-05 *** >> Population -0.40974 0.15214
> >> -2.693 0.0184 * >> I(GNP + Population) NA NA NA NA
> >> >> ---
> >> >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.'
> >> 0.1 ' ' 1
> >>
> >> >> Residual standard error: 0.5459 on 13 degrees of
> >> freedom >> Multiple R-squared: 0.9791, Adjusted
> >> R-squared: 0.9758 >> F-statistic: 303.9 on 2 and 13 DF,
> >> p-value: 1.221e-11
> >>
> >> >>> vcov(mod.lm) >> (Intercept) GNP Population >>
> >> (Intercept) 190.0269691 0.1445617813 -2.0954381 >> GNP
> >> 0.1445618 0.0001133631 -0.0016054 >> Population
> >> -2.0954381 -0.0016053999 0.0231456 >>> coef(mod.lm) >>
> >> (Intercept) GNP Population I(GNP + Population) >>
> >> 88.93879831 0.06317244 -0.40974292 NA
> >> >>>
> >> >>> mod.glm <- glm(Employed ~ GNP + Population + I(GNP +
> >> Population), >> + data=longley) >>> summary(mod.glm)
> >>
> >> >> Call: >> glm(formula = Employed ~ GNP + Population +
> >> I(GNP + Population), >> data = longley)
> >>
> >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >>
> >> -0.80899 -0.33282 -0.02329 0.25895 1.08800
> >>
> >> >> Coefficients: (1 not defined because of singularities)
> >> >> Estimate Std. Error t value Pr(>|t|) >> (Intercept)
> >> 88.93880 13.78503 6.452 2.16e-05 *** >> GNP 0.06317
> >> 0.01065 5.933 4.96e-05 *** >> Population -0.40974 0.15214
> >> -2.693 0.0184 * >> I(GNP + Population) NA NA NA NA
> >> >> ---
> >> >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.'
> >> 0.1 ' ' 1
> >>
> >> >> (Dispersion parameter for gaussian family taken to be
> >> 0.2980278)
> >>
> >> >> Null deviance: 185.0088 on 15 degrees of freedom >>
> >> Residual deviance: 3.8744 on 13 degrees of freedom >>
> >> AIC: 30.715
> >>
> >> >> Number of Fisher Scoring iterations: 2
> >>
> >> >>> coef(mod.glm) >> (Intercept) GNP Population I(GNP +
> >> Population) >> 88.93879831 0.06317244 -0.40974292 NA >>>
> >> vcov(mod.glm) >> (Intercept) GNP Population >>
> >> (Intercept) 190.0269691 0.1445617813 -2.0954381 >> GNP
> >> 0.1445618 0.0001133631 -0.0016054 >> Population
> >> -2.0954381 -0.0016053999 0.0231456
> >>
> >> >> ----------------
> >>
> >> >> Moreoever, lm() has a singular.ok() argument that
> >> defaults to TRUE, but glm() doesn't have this argument:
> >>
> >> >> ----------------
> >>
> >> >>> mod.lm <- lm(Employed ~ GNP + Population + I(GNP +
> >> Population), >> + data=longley, singular.ok=FALSE) >>
> >> Error in lm.fit(x, y, offset = offset, singular.ok =
> >> singular.ok, ...) : >> singular fit encountered
> >>
> >> >> ----------------
> >>
> >> >> In my opinion, singularity should at least produce a
> >> warning, both in calls to lm() and glm(), and in
> >> summary() output. Even better, again in my opinion, would
> >> be to produce an error by default in this situation, but
> >> doing so would likely break too much existing code.
> >>
> >> > Yes, I would not want to change. Note that this is
> >> from S > already, i.e., long "ingrained". I think there
> >> one argument was > that there are situations with factor
> >> predictors of many levels > and conceptually their 2- or
> >> even 3-way interactions (!) > where it is neat to just
> >> fit the model, (-> get residuals and > fitted values) and
> >> also see implicitly the "necessary rank" of > prediction
> >> space, or rather even more specifically, you see for >
> >> every factor how many levels are "distinguishable"/useful
> >> for > prediction, given the data.
> >>
> >> >> I prefer NA to 0 for the redundant coefficients
> >> because it at least suggests that the decision about what
> >> to exclude is arbitrary, and of course simply excluding
> >> coefficients isn't the only way to proceed.
> >>
> >> > I'm less modest and would say *definitely*, NA's are
> >> highly > prefered in such a situation.
> >>
> >> >> Finally, the differences in behaviour between coef()
> >> and vcov() and between lm() and glm() aren't really
> >> sensible.
> >>
> >> > I really haven't seen any difference between lm() and
> >> glm() in > the example above. Maybe you can point them
> >> out for me.
> >>
> >> .. now I saw it: lm() has a 'singular.ok = TRUE' argument
> >> which you can set to FALSE if you prefer an error to NA
> >> coefficients.
> >>
> >> I also agree with you John that it would be nice if glm()
> >> also got such an argument. Patches are welcome and seem
> >> easy. Nowadays we prefer them as attachments (diff/patch
> >> file!) at R's https://bugs.r-project.org bugzilla against
> >> the svn source, here
> >> https://svn.r-project.org/R/trunk/src/library/stats/R/glm.R
> >> and
> >> https://svn.r-project.org/R/trunk/src/library/stats/man/glm.Rd
> >>
> >> > I do quite agree that vcov() should be compatible with
> >> > coef() [and summary()] for both 'lm' and 'glm' methods,
> >> i.e., > should get NA rows and columns there. This would
> >> require > eliminating these before e.g. using it in
> >> solve(<vcov>, *) etc, > but I think it would be a good
> >> idea that the useR must deal with > these NAs actively.
> >>
> >> > Shall "we" try and see the fallout in CRAN space?
> >>
> >> >> Maybe there's some reason for all this that escapes
> >> me. > (for the first one---"no error"--- I gave a
> >> reason)
> >>
> >> >> Best, >> John
> >>
> >> >> --------------------------------------
> >> >> John Fox, Professor Emeritus >> McMaster University >>
> >> Hamilton, Ontario, Canada >> Web:
> >> socserv.mcmaster.ca/jfox
> >>
> >>
> >>
> >>
> >> >>> -----Original Message----- >>> From: R-devel
> >> [mailto:r-devel-bounces at r-project.org] On Behalf Of >>>
> >> Therneau, Terry M., Ph.D. >>> Sent: Wednesday, September
> >> 13, 2017 6:19 PM >>> To: r-devel at r-project.org >>>
> >> Subject: [Rd] vcov and survival
> >> >>>
> >> >>> I have just noticed a difference in behavior between
> >> coxph and lm/glm: >>> if one or more of the coefficients
> >> from the fit in NA, then lm and glm >>> omit that
> >> row/column from the variance matrix; while coxph retains
> >> it >>> but sets the values to zero.
> >> >>>
> >> >>> Is this something that should be "fixed", i.e., made
> >> to agree? I >>> suspect that doing so will break other
> >> packages, but then NA coefs are >>> rather rare so
> >> perhaps not.
> >> >>>
> >> >>> Terry Therneau
More information about the R-devel
mailing list