[Rd] "simulate" does not include variability in parameter estimation

Fri Dec 27 16:31:18 CET 2019

On 2019-12-27 04:34, Duncan Murdoch wrote:
> On 26/12/2019 11:14 p.m., Spencer Graves wrote:
>> Hello, All:
>>
>>
>>         The default "simulate" method for lm and glm seems to ignore the
>> sampling variance of the parameter estimates;  see the trivial lm and
>> glm examples below.  Both these examples estimate a mean with formula =
>> x~1.  In both cases, the variance of the estimated mean is 1.
>
> That's how it's documented to operate.  Nothing in the help page 
> suggests it would try to simulate parameter values.  Indeed, it 
> doesn't have enough information on the distribution to sample from:  
> the appropriate distribution to simulate from if you want to include 
> uncertainty in the parameter estimates is the posterior distribution, 
> but lm and glm take a classical point of view, not a Bayesian point of 
> view, so they have no concept of a posterior.

       Thanks for the reply.  What do you suggest for someone who wants 
confidence, prediction and tolerance intervals for newdata for a general 
fit object?

       For a glm object, one could get confidence intervals starting 
with predicted mean and standard errors
from predict(glm(...), newdata, type='link', se.fit=TRUE), then linkinv 
to get the confidence intervals on scale of expected values of the 
random variables.  From that one could compute tolerance intervals.

       Is there a way to get more standard prediction intervals from a 
glm object, other than the Bayesian approach coded into 
Ecfun:::simulate.glm?  And that still doesn't answer the question re. 
confidence intervals for a more general fit object like BMA::bic.glm.

       Comments?
       Thanks,
       Spencer Graves

>
>>               * In the lm example with x0 = c(-1, 1), var(x0) = 2, and
>> var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064.  Shouldn't it be 3
>> = var(mean(x0)) + var(x0) = (2/2) + 2?
>
> That calculation ignores the uncertainty in the estimation of sigma.
>
> Duncan Murdoch
>
>>
>>
>>               * In the glm example with x1=1,
>> var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn't
>> it be 2 = var(glm estimate of the mean) + var(simulated Poisson
>> distribution) = 1 + 1?
>>
>>
>>         I'm asking, because I've recently written "simulate" methods for
>> objects of class stats::glm and BMA::bic.glm, where my primary interest
>> was simulating the predicted mean with "newdata".  I'm doing this, so I
>> can get Monte Carlo prediction intervals.  My current code for
>> "simulate.glm" and "simulate.bic.glm" are available in the development
>> version of the "Ecfun" package on GitHub:
>>
>>
>> https://github.com/sbgraves237/Ecfun
>>
>>
>>         Comparing my new code with "stats:::simulate.lm" raises the
>> following questions in my mind regarding "simulate" of a fit object:
>>
>>
>>               1.  Shouldn't "simulate" start by simulating the random
>> variability in the estimated parameters?  I need that for my current
>> application.  If a generic "simulate" function should NOT include this,
>> what should we call something that does include this?  And how does the
>> current stats:::simulate.lm behavior fit with this?
>>
>>
>>               2.  Shouldn't "simulate" of a model fit include an option
>> for "newdata"?  I need that for my application.
>>
>>
>>               3.  By comparing with "predict.glm", I felt I needed an
>> argument 'type = c("link", "response")'.  "predict.glm" has an argument
>> 'type = c("link", "response", "terms")'.  I didn't need "terms", so I
>> didn't take the time to code that.  However, a general "simulate"
>> function should probably include that.
>>
>>
>>               4.  My application involves assumed Poisson counts.  I 
>> need
>> to simulate those as well.  If I combined those with "simulate.glm",
>> what would I call them?  I can't use the word "response", because that's
>> already used with a different meaning. Might "observations" be the
>> appropriate term?
>>
>>
>>         What do you think?
>>         Thanks,
>>         Spencer Graves
>>
>>
>>   > x0 <- c(-1, 1)
>>   > var(x0)
>> [1] 2
>>   > fit0 <- lm(x0~1)
>>   > vcov(fit0)
>>               (Intercept)
>> (Intercept)           1
>>   > sim0 <- simulate(fit0, 10000, 1)
>>   > var(unlist(sim0))
>> [1] 2.006408
>>   > x1 <- 1
>>   > fit1 <- glm(x1~1, poisson)
>>   > coef(fit1)
>>    (Intercept)
>> 4.676016e-11
>>   > exp(coef(fit1))
>> (Intercept)
>>             1
>>   > vcov(fit1)
>>               (Intercept)
>> (Intercept)   0.9999903
>>   > sim1 <- simulate(fit1, 10000, 1)
>>   > var(unlist(sim1))
>> [1] 1.00617
>>   > sessionInfo()
>> R version 3.6.2 (2019-12-12)
>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>> Running under: macOS Catalina 10.15.2
>>
>> Matrix products: default
>> BLAS:
>> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
>>
>> LAPACK:
>> /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib 
>>
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.6.2 tools_3.6.2
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>