[R] Fwd: Potential Issue with lm.influence

Wed Apr 3 00:53:11 CEST 2019

rstudent calls influence, to my knowledge, and all of the results passed by
rstudent are dependent on values returned by influence (other than the
weights, which I can't imagine are NaN), so I believe that influence is the
issue. See the line
https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L135
.

Eric

On Tue, Apr 2, 2019 at 6:36 PM Jim Lemon <drjimlemon using gmail.com> wrote:

> Hi Eric,
> When I run your code (using the MASS library) I find that
> rstudent(fit2) also returns NaN in the seventh position. Perhaps the
> problem is occurring there and not in the "influence" function.
>
> Jim
>
> On Wed, Apr 3, 2019 at 9:12 AM Eric Bridgeford <ericwb95 using gmail.com> wrote:
> >
> > I agree the influence documentation suggests NaNs may result; however, as
> > these can be manually computed and are, indeed, finite/existing (ie,
> > computing the held-out influence by manually training n models for n
> points
> > to obtain n leave one out influence measures), I don't possibly see how
> the
> > function SHOULD return NaN, and given that it is returning NaN, that
> > suggests to me that there should be either a) Providing an alternative
> > method to compute them that (may be slower) that returns the correct
> > results in the even that lm.influence does not return a good
> approximation
> > (ie, a command line argument for type="approx" that does the
> approximation
> > strategy employed currently, or an alternative type="direct" or something
> > like that that computes them manually), or b) a heuristic to suggest why
> > NaNs might result from one's particular inputs/what can be done to fix it
> > (if the approximation strategy is the source of the problem) or what the
> > issue is with the data that will cause NaNs. Hence I was looking to
> start a
> > discussion around the specific strategy employed to compute the elements.
> >
> > Below is the code:
> > moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L, 1L, 5L,
> > 11L,
> >                                                12L, 9L, 10L, 4L, 6L, 3L),
> > .Label = c("Ceres ", "Earth", "Eris ",
> >
> >          "Haumea ", "Jupiter ", "Makemake ", "Mars ", "Mercury ",
> "Neptune
> > ",
> >
> >          "Pluto ", "Saturn ", "Uranus ", "Venus "), class = "factor"),
> >                             Distance = c(0.39, 0.72, 1, 1.52, 2.75, 5.2,
> > 9.54, 19.22,
> >                                          30.06, 39.5, 43.35, 45.8, 67.7),
> > Diameter = c(0.382, 0.949,
> >
> >            1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15,
> >
> >            0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8,
> >
> >                                  95.2, 14.6, 17.2, 0.0022, 7e-04, 7e-04,
> > 0.0025), Moons = c(0L,
> >
> >
> >                 0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L),
> Volume
> > = c(0.0291869497930152,
> >
> >
> >
> >     0.447504348276571, 0.523598775598299, 0.0788376225681443,
> >
> >
> >
> >     0.000268082573106329, 737.393372232996, 441.729261571372,
> >
> >
> >
> >     33.6865588825666, 30.6549628355953, 0.00305362805928928,
> >
> >
> >
> >     0.00176714586764426, 0.00090477868423386, 0.00359136400182873
> >
> >
> >                 )), row.names = c(NA, -13L), class = "data.frame")
> >
> > fit <- glm.nb(Moons ~ Volume, data = moon_data)
> > rstudent(fit)
> >
> > fit2 <- update(fit, subset = Name != "Jupiter ")
> > rstudent(fit2)
> >
> > influence(fit2)$sigma
> >
> > #        1        2        3        4        5        7        8        9
> >      10       11       12       13
> > # 1.077945 1.077813 1.165025 1.181685 1.077954      NaN 1.044454 1.152110
> > 1.187586 1.181696 1.077954 1.165147
> >
> > Sincerely,
> > Eric
> >
>

-- 
Eric Bridgeford
ericwb.me

	[[alternative HTML version deleted]]