[R] mgcv: distribution of dev with Poisson data

Simon Wood s.wood at bath.ac.uk
Wed Feb 5 15:04:08 CET 2014


Hi Greg,

Yes, this sounds right - with quasipoisson gam uses `extended 
quasi-likelihood' (see McCullagh and Nelder's GLM book) to allow 
estimation of the scale parameter along with the smoothing parameters 
via (RE)ML, and it could well be that this gives a biased scale estimate 
with low counts (although the shape of the mean variance relationship is 
unaffected by this).

It might well be more sensible to default to a Pearson estimate of the 
scale parameter for 'gam' ('bam' and 'gamm' already do this), to avoid 
this issue with quasipoisson and low counts. Your email reminded me that 
someone had reported rather too high p-values for quasipoisson with 
these sort of data, and looking back at my list of open issues I see 
that `someone' was you as well! It's possible that moving to a Pearson 
estimator of the scale will solve that problem too.

Thanks for this... very helpful.

best,
Simon


On 05/02/14 12:56, Greg Dropkin wrote:
> thanks Simon
>
> also, it appears at least with ML that the default scale estimate with
> quasipoisson (i.e. using dev) is the scale which minimises the ML value of
> the fitted model. So it is the "best" model but doesn't actually give the
> correct mean-variance relation. Is that right?
>
> thanks again
>
> Greg
>
>> Greg,
>>
>> The deviance being chi^2 distributed on the residual degrees of freedom
>> works in some cases (mostly where the response itself can be reasonably
>> approximated as Gaussian), but rather poorly in others (noteably low
>> count data). This is what you are seeing in your simulations - in the
>> first case the Poisson mean is relatively high - so the  normal
>> approximation to the Poisson is not too bad and the deviance is approx.
>> chi.sq on residual df. In the second case the Poisson mean is low, there
>> are lots of zeroes, and the approximation breaks down. So yes, the
>> Pearson estimator is probably a better bet in the latter case.
>>
>> best,
>> Simom
>>
>> ps. Note that the chi squared approximation for the *difference* in
>> deviance between two nested models does not suffer from this problem.
>>
>>
>> On 04/02/14 09:25, Greg Dropkin wrote:
>>> mgcv: distribution of dev
>>>
>>> hi
>>>
>>> I can't tell if this is a simple error.
>>> I'm puzzled by the distribution of dev when fitting a gam to Poisson
>>> generated data.
>>> I expected dev to be approximately chi-squared on residual d.f., i.e.
>>> about 1000 in each case below.
>>> In particular, the low values in the 3rd and 4th versions would suggest
>>> scale < 1, yet the data is Poisson generated.
>>> The problem isn't caused by insufficient k values in the smoother.
>>> Does this mean that with sparse data the distribution of dev is no
>>> longer
>>> approx chi-sq on residual df?
>>> Does it mean the scale estimate when fitting quasipoisson should be the
>>> Pearson version?
>>>
>>> thanks
>>>
>>> greg
>>>
>>> library(mgcv)
>>> set.seed(1)
>>> x1<-runif(1000)
>>> linp<-2+exp(-2*x1)*sin(8*x1)
>>> sum(exp(linp))
>>> dev1<-dev2<-sums<-rep(0,20)
>>> for (j in 1:20)
>>> {
>>> y<-rpois(1000,exp(linp))
>>> sums[j]<-sum(y)
>>> m1<-gam(y~s(x1),family="poisson")
>>> m2<-gam(y~s(x1,k=20),family="poisson")
>>> dev1[j]=m1$dev
>>> dev2[j]=m2$dev
>>> }
>>> mean(sums)
>>> sd(sums)
>>> mean(dev1)
>>> sd(dev1)
>>> mean(dev2)
>>> sd(dev2)
>>>
>>> #dev slighly higher than expected
>>>
>>> linpa<--1+exp(-2*x1)*sin(8*x1)
>>> sum(exp(linpa))
>>> dev1a<-dev2a<-sumsa<-rep(0,20)
>>> for (j in 1:20)
>>> {
>>> y<-rpois(1000,exp(linpa))
>>> sumsa[j]<-sum(y)
>>> m1<-gam(y~s(x1),family="poisson")
>>> m2<-gam(y~s(x1,k=20),family="poisson")
>>> dev1a[j]=m1$dev
>>> dev2a[j]=m2$dev
>>> }
>>> mean(sumsa)
>>> sd(sumsa)
>>> mean(dev1a)
>>> sd(dev1a)
>>> mean(dev2a)
>>> sd(dev2a)
>>>
>>> #dev a bit lower than expected
>>>
>>> linpb<--1.5+exp(-2*x1)*sin(8*x1)
>>> sum(exp(linpb))
>>> dev1b<-dev2b<-sumsb<-rep(0,20)
>>> for (j in 1:20)
>>> {
>>> y<-rpois(1000,exp(linpb))
>>> sumsb[j]<-sum(y)
>>> m1<-gam(y~s(x1),family="poisson")
>>> m2<-gam(y~s(x1,k=20),family="poisson")
>>> dev1b[j]=m1$dev
>>> dev2b[j]=m2$dev
>>> }
>>> mean(sumsb)
>>> sd(sumsb)
>>> mean(dev1b)
>>> sd(dev1b)
>>> mean(dev2b)
>>> sd(dev2b)
>>>
>>> #dev much lower than expected
>>>
>>> m1q<-gam(y~s(x1),family="quasipoisson",scale=-1)
>>> m1q$scale
>>> m1q$dev/(1000-sum(m1q$edf))
>>>
>>> #scale estimate < 1
>>>
>>> sum((y-fitted(m1q))^2/fitted(m1q))/(1000-sum(m1q$edf))
>>>
>>> #Pearson estimate of scale closer, but also < 1
>>>
>>>
>>> linpc<--2+exp(-2*x1)*sin(8*x1)
>>> sum(exp(linpc))
>>> dev1c<-dev2c<-sumsc<-rep(0,20)
>>> for (j in 1:20)
>>> {
>>> y<-rpois(1000,exp(linpc))
>>> sumsc[j]<-sum(y)
>>> m1<-gam(y~s(x1),family="poisson")
>>> m2<-gam(y~s(x1,k=20),family="poisson")
>>> dev1c[j]=m1$dev
>>> dev2c[j]=m2$dev
>>> }
>>> mean(sumsc)
>>> sd(sumsc)
>>> mean(dev1c)
>>> sd(dev1c)
>>> mean(dev2c)
>>> sd(dev2c)
>>>
>>> #dev much lower than expected
>>>
>>> m1q<-gam(y~s(x1),family="quasipoisson",scale=-1)
>>> m1q$scale
>>> m1q$sig2
>>> m1q$dev/(1000-sum(m1q$edf))
>>>
>>> #scale estimate < 1
>>>
>>> sum((y-fitted(m1q))^2/fitted(m1q))/(1000-sum(m1q$edf))
>>>
>>> #Pearson estimate of scale ok
>>>
>>>
>>
>>
>> --
>> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
>> +44 (0)1225 386603               http://people.bath.ac.uk/sw283
>>
>>
>
>


-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283




More information about the R-help mailing list