[R] GLM: What is a good way for dealing with new factor levels in the test set?

thuksu toby at huksu.com
Thu Apr 30 17:02:26 CEST 2015


Hi, Thanks for the reply!

I did try this...

# res is a data frame
levels(res$mytypeid.f) <- c(levels(res$mytypeid.f),"mynewlevel")
logreg <- glm(yesno ~ mytypeid.f + amount, data=res, family="binomial")
exp(coef(logreg)) 
# this result shows that the new level is not included in the regression. 
it's probably automatically removed.


I think what I want to do is identify new levels that are not in the
training set, and prune those from the test set.  Then I would be using the
dummy variable by default, which I think is the "average", from reading
this:
http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm

Problem is, I'm not sure how to do that...



--
View this message in context: http://r.789695.n4.nabble.com/GLM-What-is-a-good-way-for-dealing-with-new-factor-levels-in-the-test-set-tp4706621p4706644.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list