[R] Problem with predict and lines in plotting binomial glm
Eik Vettorazzi
E.Vettorazzi at uke.de
Wed Sep 21 11:17:37 CEST 2011
Hi Anina,
predict.glm returns predicted probabilities, when used with
type="response", so you have to either scale the probs to the number of
trials for any x or you plot probs from start:
par(mfcol=c(1,2))
plot(x, successes)
lines(x, (successes+failures)*predict(glm1, type= "response"), lwd=2)
plot(x, successes/(successes+failures))
lines(x, predict(glm1, type= "response"), lwd=2)
If you use predict with new data, please name the rows accordingly, when
you actually want to use them:
newdf <- data.frame(seq(min(x), max(x), length=20))
#you would expect 20 predictions, but *surprise* you get
length(predict(glm2, type="response", newdata= newdf)) #52!
newdf <- data.frame(x=seq(min(x), max(x), length=20))
length(predict(glm2, type="response", newdata= newdf)) #ok
And you should use the appropriate x in "lines" as well:
plot(x,successes/(successes+failures))
lines(newdf$x, predict(glm2, type="response", newdata= newdf),col="red")
Cheers
Am 21.09.2011 09:55, schrieb Heystek, A, Me <15418693 at sun.ac.za>:
> Problems with predict and lines in plotting binomial glm
> Dear R-helpers
>
> I have found quite a lot of tips on how to work with glm through this mailing list, but still have a problem that I can't solve.
> I have got a data set of which the x-variable is count data and the y-variable is proportional data, and I want to know what the relationship between the variables are.
> The data was overdispersed (the residual deviance is much larger than the residual degrees of freedom) therefore I am using the quasibinomial family, thus the y-variable is a matrix of successes and failures (20 trials for every sample, thus each y-variable row counts up to 20).
>
> x <- c(1200, 1200, 1200, 1200, 1200, 1200, 1200, 1200, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 2400, 2400, 2400, 2400, 2400, 2400, 2400, 3000, 3000, 3600, 3600, 3600, 3600, 4200, 4200, 4800, 4800, 5400, 6600, 6600, 7200, 7800, 7800, 8400, 8400, 8400, 9000, 9600, 10200, 13200, 18000, 20400, 24000, 25200, 36600)
> successes <- c(6, 16, 11, 14, 11, 16, 13, 13, 14, 16, 12, 12, 11, 15, 12, 9, 7, 7, 17, 15, 13, 9, 9, 12, 14, 8, 9, 16, 7, 9, 14, 11, 8, 8, 13, 6, 16, 11, 9, 7, 9, 8, 4, 14, 7, 3, 3, 9, 12, 8, 4, 6)
> failures <- c(14, 4, 9, 6, 9, 4, 7, 7, 6, 4, 8, 8, 9, 5, 8, 11, 13, 13, 3, 5, 7, 11, 11, 8, 6, 12, 11, 4, 13, 11, 6, 9, 12, 12, 7, 14, 4, 9, 11, 13, 11, 12, 16, 6, 13, 17, 17, 11, 8, 12, 16, 14)
> y <- cbind(successes, failures)
> data <- data.frame(y, x)
>
> glm1 <- glm(y ~ x, family= quasibinomial, data= data)
> glm2 <- glm(y ~ log(x), family=quasibinomial, data= data) # residual deviance is lower with log transformed x-value
> plot(x, successes)
> lines(x, predict(glm1, type= "response"), lwd=2)
>
>
>
>
> Firstly, because of the skewed distribution of the x variable I am not sure whether it should be log transformed or not. When I do log transform it, the residual deviance and the p-value for the slope is lower.
> Either way, the lines command does not plot any line and neither does it give any error messages. On some of my other data it plots a line way below all the data points. From what I can gather, the predict function as it is now uses the fitted values because no newdata argument is specified. I want the line to be predicted from the same x-values. I tried two ways of adding the newdata argument:
>
> ## a data.frame using the original x-values
> lines(x, predict(glm2, type= "response", newdata= as.data.frame(x)))
> ## or a data.frame with values (the same length as y) from the range of x values
> newdf <- data.frame(seq(min(x), max(x), length=52))
> lines(x, predict(glm2, type="response", newdata= newdf))
>
> Only the second option plotted a line once, but then I could never get it to do the same again on a new plot even though I used the same variables and same code.
>
>
> Thank you very much for your time and patient
> Anina Heystek
>
> BSc Honours student
> Department of Botany and Zoology
> University of Stellenbosch, Stellenbosch, South Africa
> 15418693 at sun.ac.za
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Eik Vettorazzi
Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf
Martinistr. 52
20246 Hamburg
T ++49/40/7410-58243
F ++49/40/7410-57790
--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG):
Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg
Vorstandsmitglieder: Prof. Dr. Jörg F. Debatin (Vorsitzender), Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus
More information about the R-help
mailing list