Problems with predict and lines in plotting binomial glm
Dear R-helpers
I have found quite a lot of tips on how to work with glm through this mailing list, but still have a problem that I can't solve.
I have got a data set of which the x-variable is count data and the y-variable is proportional data, and I want to know what the relationship between the variables are.
The data was overdispersed (the residual deviance is much larger than the residual degrees of freedom) therefore I am using the quasibinomial family, thus the y-variable is a matrix of successes and failures (20 trials for every sample, thus each y-variable row counts up to 20).
x <- c(1200, 1200, 1200, 1200, 1200, 1200, 1200, 1200, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 2400, 2400, 2400, 2400, 2400, 2400, 2400, 3000, 3000, 3600, 3600, 3600, 3600, 4200, 4200, 4800, 4800, 5400, 6600, 6600, 7200, 7800, 7800, 8400, 8400, 8400, 9000, 9600, 10200, 13200, 18000, 20400, 24000, 25200, 36600)
successes <- c(6, 16, 11, 14, 11, 16, 13, 13, 14, 16, 12, 12, 11, 15, 12, 9, 7, 7, 17, 15, 13, 9, 9, 12, 14, 8, 9, 16, 7, 9, 14, 11, 8, 8, 13, 6, 16, 11, 9, 7, 9, 8, 4, 14, 7, 3, 3, 9, 12, 8, 4, 6)
failures <- c(14, 4, 9, 6, 9, 4, 7, 7, 6, 4, 8, 8, 9, 5, 8, 11, 13, 13, 3, 5, 7, 11, 11, 8, 6, 12, 11, 4, 13, 11, 6, 9, 12, 12, 7, 14, 4, 9, 11, 13, 11, 12, 16, 6, 13, 17, 17, 11, 8, 12, 16, 14)
y <- cbind(successes, failures)
data <- data.frame(y, x)
glm1 <- glm(y ~ x, family= quasibinomial, data= data)
glm2 <- glm(y ~ log(x), family=quasibinomial, data= data) # residual deviance is lower with log transformed x-value
plot(x, successes)
lines(x, predict(glm1, type= "response"), lwd=2)
Firstly, because of the skewed distribution of the x variable I am not sure whether it should be log transformed or not. When I do log transform it, the residual deviance and the p-value for the slope is lower.
Either way, the lines command does not plot any line and neither does it give any error messages. On some of my other data it plots a line way below all the data points. From what I can gather, the predict function as it is now uses the fitted values because no newdata argument is specified. I want the line to be predicted from the same x-values. I tried two ways of adding the newdata argument:
## a data.frame using the original x-values
lines(x, predict(glm2, type= "response", newdata= as.data.frame(x)))
## or a data.frame with values (the same length as y) from the range of x values
newdf <- data.frame(seq(min(x), max(x), length=52))
lines(x, predict(glm2, type="response", newdata= newdf))
Only the second option plotted a line once, but then I could never get it to do the same again on a new plot even though I used the same variables and same code.
Thank you very much for your time and patient
Anina Heystek
BSc Honours student
Department of Botany and Zoology
University of Stellenbosch, Stellenbosch, South Africa
15418693@sun.ac.za
[[alternative HTML version deleted]]