[R] Problem with binomial gam{mgcv}

Simon Wood simon.wood at bath.edu
Tue Oct 13 15:17:10 CEST 2015


Following up on Duncan's answer...

You can see that the model doesn't fit with a little bit of residual 
checking. For example

gam.check(Model)

shows that the binomial assumption is just wrong here: there is massive 
overdispersion, for example. Even the crudest response to this, of using 
quasibinomial, then gives a non-significant  s(x1) in the examples I've 
tried.

best,
Simon




On 10/10/15 12:40, Duncan Murdoch wrote:
> On 09/10/2015 8:00 PM, Erin Conlisk wrote:
>> Hello,
>>
>> I am having trouble testing for the significance using a binomial model in
>> gam{mgcv}.  Have I stumbled on a bug?  I doubt I would be so lucky, so
>> could someone tell me what I am doing wrong?
>>
>> Please see the following code:
>> ________________________________
>>
>> # PROBLEM USING cbind
>>
>> x1 <- runif(500, 0, 100)  # Create 500 random variables to use as my
>> explanatory variable
>>
>> y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve as
>> binomial "successes"
>>
>> y2 <- 100-y1 # Create 500 binomial "failures", assuming a total of 100
>> trials and the successes recorded in y1
>>
>> Model <- gam(cbind(y1, y2) ∼ 1 + s(x1), family=binomial)
>> summary(Model)
>> ________________________________
>>
>> The result is that my random variable, x1, is highly significant.  This
>> can't be right...
> The problem is that statistical significance of a test doesn't mean that
> the alternative you have in mind is right, it just means that the null
> is wrong (or you were unlucky, but let's ignore that).
>
> The null hypothesis here is that all of the y1 values are independent
> binomial values with a common n=100 and common unspecifed probability of
> success.
>
> Since in fact y1 comes from a discrete uniform distribution, that's
> false, and the p-value is not anywhere near being a random Unif(0,1) value.
>
> If you wanted the null hypothesis to be true, you'd need to choose a
> success probability p somehow, then set y1 <- rbinom(500, size=100, prob
> = p).  The random uniform has a far larger variance, and that leads to a
> larger deviance in gam, hence significance.
>
> Duncan Murdoch
>
>> So what happens when I change the observations from a "batch" of 100 trials
>> to individual successes and failures?
>> ________________________________
>>
>> # NOW MAKE ALL THESE DATA 0 and 1
>>
>> r01<-rep(0,500)
>> data01<-cbind(x1, y1, y2, r01)
>> rownames(data01)<-seq(1,500, 1)
>> colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
>> data01<-data.frame(data01)
>>
>> expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4]  # Creates a
>> replicate of the      #  explanatory variables for each individual "success"
>>
>> r01<-rep(1,500)
>> data01<-cbind(x1, y1, y2, r01)
>> rownames(data01)<-seq(1,500, 1)
>> colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
>> data01<-data.frame(data01)
>>
>> expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4]  # Creates a
>> replicate of the      #  explanatory variables for each individual "failure"
>>
>> data01<-rbind(expanded0,expanded1)
>>
>> Model2 <- gam(r01 ∼ 1 + s(x1), family=binomial)
>> summary(Model2)
>> ___________________________________
>>
>> The result is what I expect.  Now my random variable, x1, is NOT
>> significant.
>>
>> What is going on here?
>>
>> I should say that I didn't just make this up.  My question arose playing
>> with my real data, where I was getting high significance, but a terrible
>> proportion of deviance explained.
>>
>> My apologies if this is explained elsewhere, but I have spent hours
>> searching for an answer online.
>>
>> Thank you kindly,
>> Erin Conlisk
>>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283



More information about the R-help mailing list