[R] predict nbinomial glm
Sundar Dorai-Raj
sundar.dorai-raj at pdf.com
Tue Aug 16 16:33:47 CEST 2005
Katharina,
I agree with Prof. Ripley's assessment. But, perhaps one thing you may
have overlooked is that subset.data.frame does not remove unused levels. So,
> subset_of_dataframe = subset(data_frame, (b > 80 & c < 190))
> levels(subset_of_dataframe$d)
[1] "q" "r" "s" "t"
> table(subset_of_dataframe$d)
q r s t
0 20 50 10
Even though the level "q" does not appear it is still a level of "d".
Perhaps you need to do the following after the subset:
subset_of_dataframe[] <- lapply(subset_of_dataframe, "[", drop = TRUE)
which drops all unused levels from factors.
I'm not sure if your problem is statistical in nature or simply a
misunderstanding of the software. I'm only attempting to answer the
latter. As Prof. Ripley suggests, discuss any statistical problem (i.e.
predicting on missing levels) with your advisor.
HTH,
--sundar
P.S. Also, update R. It's free.
Prof Brian Ripley wrote:
> This is seems to be an unstated repeat of much of an earlier and
> unanswered post
>
> https://stat.ethz.ch/pipermail/r-help/2005-August/075914.html
>
> entitled
>
> [R] error in predict glm (new levels cause problems)
>
> It is nothing to do with `nbinomial glm' (sic): all model fitting
> functions including lm and glm do this. The reason you did not get at
> least one reply from your first post is that you seemed not to have done
> your homework. (One thing the posting guide does ask is for you to try
> the current version of R, and yours is three versions old.)
>
> The code is protecting you from an attempt at statistical nonsense.
> (Indeed, the check was added to catch such misuses.) Your email address
> seems to be that of a student, so please seek the help of your advisor.
> You seem surprised that you are not allowed to make predictions about
> levels for which you have supplied no relevant data.
>
>
> On Tue, 16 Aug 2005, K. Steinmann wrote:
>
>
>>Dear R-helpers,
>>
>>let us assume, that I have the following dataset:
>>
>>a <- rnbinom(200, 1, 0.5)
>>b <- (1:200)
>>c <- (30:229)
>>d <- rep(c("q", "r", "s", "t"), rep(50,4))
>>data_frame <- data.frame(a,b,c,d)
>>
>>In a first step I run a glm.nb (full code is given at the end of this mail) and
>>want to predict my response variable a.
>>In a second step, I would like to run a glm.nb based on a subset of the
>>data_frame. As soon as I want to predict the response variable a, I get the
>>following error message:
>>"Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
>>object$xlevels) :
>> factor d has new level(s) q"
>>
>>Does anybody have a solution to this problem?
>>
>>Thank you in advance,
>>K. Steinmann (working with R 2.0.0)
>>
>>
>>Code:
>>
>>library(MASS)
>>
>>a <- rnbinom(200, 1, 0.5)
>>b <- (1:200)
>>c <- (30:229)
>>d <- rep(c("q", "r", "s", "t"), rep(50,4))
>>
>>data_frame <- data.frame(a,b,c,d)
>>
>>model_1 = glm.nb(a ~ b + d , data = data_frame)
>>
>>pred_model_1 = predict(model_1, newdata = data_frame, type = "response", se.fit
>>= FALSE, dispersion = NULL, terms = NULL)
>>
>>subset_of_dataframe = subset(data_frame, (b > 80 & c < 190 ))
>>
>>model_2 = glm.nb(a ~ b + d , data = subset_of_dataframe)
>>pred_model_2 = predict(model_2, newdata = subset_of_dataframe, type =
>>"response", se.fit = FALSE, dispersion = NULL, terms = NULL)
>
>
More information about the R-help
mailing list