[R] GLM: What is a good way for dealing with new factor levels in the test set?
thuksu
toby at huksu.com
Thu Apr 30 17:02:26 CEST 2015
Hi, Thanks for the reply!
I did try this...
# res is a data frame
levels(res$mytypeid.f) <- c(levels(res$mytypeid.f),"mynewlevel")
logreg <- glm(yesno ~ mytypeid.f + amount, data=res, family="binomial")
exp(coef(logreg))
# this result shows that the new level is not included in the regression.
it's probably automatically removed.
I think what I want to do is identify new levels that are not in the
training set, and prune those from the test set. Then I would be using the
dummy variable by default, which I think is the "average", from reading
this:
http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm
Problem is, I'm not sure how to do that...
--
View this message in context: http://r.789695.n4.nabble.com/GLM-What-is-a-good-way-for-dealing-with-new-factor-levels-in-the-test-set-tp4706621p4706644.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list