[R] Problem with predict and lines in plotting binomial glm

Eik Vettorazzi E.Vettorazzi at uke.de
Wed Sep 21 11:17:37 CEST 2011


Hi Anina,
predict.glm returns predicted probabilities, when used with
type="response", so you have to either scale the probs to the number of
trials for any x or you plot probs from start:

par(mfcol=c(1,2))
plot(x, successes)
lines(x, (successes+failures)*predict(glm1, type= "response"), lwd=2)
plot(x, successes/(successes+failures))
lines(x, predict(glm1, type= "response"), lwd=2)

If you use predict with new data, please name the rows accordingly, when
you actually want to use them:

newdf <- data.frame(seq(min(x), max(x), length=20))
#you would expect 20 predictions, but *surprise* you get
length(predict(glm2, type="response", newdata= newdf)) #52!
newdf <- data.frame(x=seq(min(x), max(x), length=20))
length(predict(glm2, type="response", newdata= newdf)) #ok

And you should use the appropriate x in "lines" as well:

plot(x,successes/(successes+failures))
lines(newdf$x, predict(glm2, type="response", newdata= newdf),col="red")

Cheers

Am 21.09.2011 09:55, schrieb Heystek, A, Me <15418693 at sun.ac.za>:
> Problems with predict and lines in plotting binomial glm
> Dear R-helpers
> 
> I have found quite a lot of tips on how to work with glm through this mailing list, but still have a problem that I can't solve.
> I have got a data set of which the x-variable is count data and the y-variable is proportional data, and I want to know what the relationship between the variables are.
> The data was overdispersed (the residual deviance is much larger than the residual degrees of freedom) therefore I am using the quasibinomial family, thus the y-variable is a matrix of successes and failures (20 trials for every sample, thus each y-variable row counts up to 20).
> 
> x <- c(1200, 1200, 1200, 1200, 1200, 1200, 1200, 1200, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 2400, 2400, 2400, 2400, 2400, 2400, 2400, 3000, 3000, 3600, 3600, 3600, 3600, 4200, 4200, 4800, 4800, 5400, 6600, 6600, 7200, 7800, 7800, 8400, 8400, 8400, 9000, 9600, 10200, 13200, 18000, 20400, 24000, 25200, 36600)
> successes <- c(6, 16, 11, 14, 11, 16, 13, 13, 14, 16, 12, 12, 11, 15, 12, 9, 7, 7, 17, 15, 13, 9, 9, 12, 14, 8, 9, 16, 7, 9, 14, 11, 8, 8, 13, 6, 16, 11, 9, 7, 9, 8, 4, 14, 7, 3, 3, 9, 12, 8, 4, 6)
> failures <- c(14, 4, 9, 6, 9, 4, 7, 7, 6, 4, 8, 8, 9, 5, 8, 11, 13, 13, 3, 5, 7, 11, 11, 8, 6, 12, 11, 4, 13, 11, 6, 9, 12, 12, 7, 14, 4, 9, 11, 13, 11, 12, 16, 6, 13, 17, 17, 11, 8, 12, 16, 14)
> y <- cbind(successes, failures)
> data <- data.frame(y, x)
> 
> glm1 <- glm(y ~ x, family= quasibinomial, data= data)
> glm2 <- glm(y ~ log(x), family=quasibinomial, data= data)     # residual deviance is lower with log transformed x-value
> plot(x, successes)
> lines(x, predict(glm1, type= "response"), lwd=2)
> 
> 
> 
> 
> Firstly, because of the skewed distribution of the x variable I am not sure whether it should be log transformed or not.  When I do log transform it, the residual deviance and the p-value for the slope is lower.
> Either way, the lines command does not plot any line and neither does it give any error messages.  On some of my other data it plots a line way below all the data points.  From what I can gather, the predict function as it is now uses the fitted values because no newdata argument is specified. I want the line to be predicted from the same x-values.  I tried two ways of adding the newdata argument:
> 
> ## a data.frame using the original x-values
> lines(x, predict(glm2, type= "response", newdata= as.data.frame(x)))
> ##  or a data.frame with values (the same length as y) from the range of x values
> newdf <- data.frame(seq(min(x), max(x), length=52))
> lines(x, predict(glm2, type="response", newdata= newdf))
> 
> Only the second option plotted a line once, but then I could never get it to do the same again on a new plot even though I used the same variables and same code.
> 
> 
> Thank you very much for your time and patient
> Anina Heystek
> 
> BSc Honours student
> Department of Botany and Zoology
> University of Stellenbosch, Stellenbosch, South Africa
> 15418693 at sun.ac.za
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Jörg F. Debatin (Vorsitzender), Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus 



More information about the R-help mailing list