Uwe Ligges
ligges at statistik.tu-dortmund.de
Wed Mar 11 10:44:35 CET 2009
Mehmet U Ayvaci wrote:
> Hi,
>
>
>
> I have a database of 2211 rows with 31 entries each and I manually split my
> data into 10 folds for cross validation. I build logistic regression model
> as:
>
>
>
>> model <- glm(qual ~ AgGr + FaHx + PrHx + PrSr + PaLp + SvD + IndExam +
>
> Rad +BrDn + BRDS + PrinFin+ SkRtr + NpRtr + SkThck +TrThkc +
> SkLes + AxAdnp + ArcDst + MaDen + CaDt + MaMG +
>
> MaMrp + MaSh + SCTub + SCFoc + MaSz,
> family=binomial(link=logit));
>
>
>
> Where the variables are taken from the trainSet of size 1989x31. The test
> set is sized 222x31. Now my question is when I try to predict on the test
> set it gives me the error:
>
>
>
>> predict.glm(model, testSet, type ="response")
>
> "Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
>
> subscript out of bounds"
>
>
>
> It does fine on trainSet. so it is something about the testSet. On the other
> hand, I realized that some independent variables say "MaSz" takes 3
> different values in the trainset vs. 4 different ones in the testSet. I am
> not sure if this is the cause.If so, what would be the remedy?
>
>
>
> Since I can retrieve the coefficients of the logistic regression, I could
> manually calculate response for each entry in the testSet. This could solve
> my problem although burdensome. But, I don't know an easy way of doing it as
> my logistic regression have 80+ coefficients.
Well, if "MaSz takes 3 different values in the trainset vs. 4 different
ones in the testSet", then you won't even be able to calculate it by
hand, because you got no coefficients for the 4th level of that factor.
Either you need the data to estimate coefficients from or you cannot
predict.
Uwe Ligges
>
>
>
>
> Could somebody advise?
>
>
>
>
>
> Thanks,
>
> M
>
>
>
