[R] Using predict() After Adding a Factor to a glm.nb() Model
David L Carlson
dcarlson at tamu.edu
Sat Sep 8 23:21:17 CEST 2012
Actually the first one did not work. You left out the warning message:
> py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1))
> p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')
Warning message:
'newdata' had 20 rows but variable(s) found have 22 rows
Since py did not contain the explanatory variable "year," predict() threw a
warning and then just used the values in data giving you the same results
that you already have stored in model_a. Compare
> model_a$fitted.values
1 2 3 4 5 6 7
8
61.444110 55.163093 49.524141 15.124074 12.190050 61.444110 44.461622
55.163093
9 10 11 12 13 14 15
16
15.124074 12.190050 7.919156 61.444110 55.163093 49.524141 44.461622
39.916610
17 18 19 20 21 22
35.836204 32.172911 28.884091 25.931465 23.280666 20.900841
> p1$fit
1 2 3 4 5 6 7
8
61.444110 55.163093 49.524141 15.124074 12.190050 61.444110 44.461622
55.163093
9 10 11 12 13 14 15
16
15.124074 12.190050 7.919156 61.444110 55.163093 49.524141 44.461622
39.916610
17 18 19 20 21 22
35.836204 32.172911 28.884091 25.931465 23.280666 20.900841
Set the variable name to year in py and then use predict():
> py<-data.frame(year=seq(from=min(data$year), to=max(data$year), by=1))
> p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')
Now run your plotting commands. The plot may not look much different, but
now it is based on the 20 evenly spaced values in py instead of the 22
original data values in data.
Now you have the right idea for adding site to py but instead of using the
irregularly spaced data, use evenly spaced values. Then you will have to
plot separate lines for each site:
> model_b<-glm.nb(count~year+site, data=data)
> py2 <- expand.grid(site=letters[1:4], year=1980:1999)
> p2<-predict(model_b, newdata=py2, se.fit=TRUE, type='response')
> py2 <- data.frame(py2, p2)
> plot(count~year, data=data, pch=as.character(site))
> x <- unstack(py2, py2$year~py2$site)
> y <- unstack(py2, py2$fit~py2$site)
> matlines(x, y)
> legend("topright", letters[1:4], col=1:4, lty=1:4)
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of 077315q at acadiau.ca
> Sent: Saturday, September 08, 2012 2:47 PM
> To: r-help at r-project.org
> Subject: [R] Using predict() After Adding a Factor to a glm.nb() Model
>
> # Hello,
>
> # I have a data set that looks something like the following:
>
> site<-c(rep('a',5),rep('b',2),rep('c',4),rep('d',11))
> year<-c(1980, 1981, 1982, 1993, 1995, 1980, 1983, 1981, 1993, 1995,
> 1999, c(1980:1990))
> count<-
> c(60,35,36,12,8,112,98,20,13,15,15,65,43,49,51,34,33,33,33,40,11,0)
>
> data<-data.frame(site, year, count)
>
> # > site year count
> # 1 a 1980 60
> # 2 a 1981 35
> # 3 a 1982 36
> # 4 a 1993 12
> # .
> # .
> # .
>
> # I ran a negative binomial glm using:
>
> library(MASS)
> model_a<-glm.nb(count~year, data=data)
>
> # then extracted predicted values using:
>
> py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1))
> p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')
> f1<-p1$fit
>
> plot(count~year, data=data)
> lines(py$year, f1)
>
> # Works perfectly.
>
> # Now, I want to add site as a factor:
> model_b<-glm.nb(count~year+as.factor(site), data=data)
>
> # I have tried a couple different ways of predicting the values based
> on model_b, but am having trouble.
>
> py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1))
> p1<-predict(model_b, newdata=py, se.fit=TRUE, type='response')
>
> # gives:
> # >Error in model.frame.default(Terms, newdata, na.action = na.action,
> xlev = object$xlevels) :
> # variable lengths differ (found for 'as.factor(site)')
> # In addition: Warning message:
> # 'newdata' had 20 rows but variable(s) found have 22 rows
>
> py<-expand.grid(data$site, data$year)
> names(py)<-c('site','year')
> p1<-predict(model_b, newdata=py)
>
> # This works, but results in 484 values, and I can't plot a line over
> my points.
>
> # There is probably a simple solution, but I'm having trouble wrapping
> my mind around it. Mind you, this is also a last
> # minute change to my thesis, and I haven't slept in about three days.
>
> # Any suggestions? I would be extremely grateful...
>
> # Banging my head against the wall,
> # A stressed out grad student
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list