[R] predict nbinomial glm

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Aug 16 16:13:50 CEST 2005


This is seems to be an unstated repeat of much of an earlier and 
unanswered post

 	https://stat.ethz.ch/pipermail/r-help/2005-August/075914.html

entitled

 	[R] error in predict glm (new levels cause problems)

It is nothing to do with `nbinomial glm' (sic): all model fitting 
functions including lm and glm do this.  The reason you did not get at 
least one reply from your first post is that you seemed not to have done 
your homework.  (One thing the posting guide does ask is for you to try 
the current version of R, and yours is three versions old.)

The code is protecting you from an attempt at statistical nonsense. 
(Indeed, the check was added to catch such misuses.)  Your email address 
seems to be that of a student, so please seek the help of your advisor. 
You seem surprised that you are not allowed to make predictions about 
levels for which you have supplied no relevant data.


On Tue, 16 Aug 2005, K. Steinmann wrote:

> Dear R-helpers,
>
> let us assume, that I have the following dataset:
>
> a <- rnbinom(200, 1, 0.5)
> b <- (1:200)
> c <- (30:229)
> d <- rep(c("q", "r", "s", "t"), rep(50,4))
> data_frame <- data.frame(a,b,c,d)
>
> In a first step I run a glm.nb (full code is given at the end of this mail) and
> want to predict my response variable a.
> In a second step, I would like to run a glm.nb based on a subset of the
> data_frame. As soon as I want to predict the response variable a, I get the
> following error message:
> "Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
> object$xlevels) :
>        factor d has new level(s) q"
>
> Does anybody have a solution to this problem?
>
> Thank you in advance,
> K. Steinmann (working with R 2.0.0)
>
>
> Code:
>
> library(MASS)
>
> a <- rnbinom(200, 1, 0.5)
> b <- (1:200)
> c <- (30:229)
> d <- rep(c("q", "r", "s", "t"), rep(50,4))
>
> data_frame <- data.frame(a,b,c,d)
>
> model_1 = glm.nb(a ~ b + d , data = data_frame)
>
> pred_model_1 = predict(model_1, newdata = data_frame, type = "response", se.fit
> = FALSE, dispersion = NULL, terms = NULL)
>
> subset_of_dataframe = subset(data_frame, (b > 80 & c < 190 ))
>
> model_2 = glm.nb(a ~ b + d , data = subset_of_dataframe)
> pred_model_2 = predict(model_2, newdata = subset_of_dataframe, type =
> "response", se.fit = FALSE, dispersion = NULL, terms = NULL)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list