[R-sig-eco] (no subject)
nikko at hailmail.net
Mon Jul 5 20:45:13 CEST 2010
I am guessing the problem is that because you have categorical
you are getting empty cells in your cross validation sets, and hence
Unfortunately, you are now in a very tricky situation, to get at the
generalization error of your
model you need to have a sampling scheme that approximates the
of your predictors. One way to get at this might be to use Bayesian
logistic regression, with
very diffuse priors on the coefficients. This will serve to moderate the
problem of zero cells in your
resampling scheme, and probably increase your prediction error, which in
this case may be a good thing.
Hope this helps
> To: r-sig-ecology at r-project.org
> Subject: [R-sig-eco] Modeling when all variables are categoricalb
> Message-ID: <4C30DED9.2030007 at gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Dear list memebers,
> I am modeling a binary response variable and 6 explanatory factors (all
> my variables, response and explanatory, are categoricals).
> I fitted a logistic regression but when I tried to use the CVbinary
> (DAAG package) function to measure the predictive accuracy of the
> regression model with a binary response I got the following result:
> > mod1 = glm(condicion ~ ., family=binomial, data=reglog)
> > CVbinary(mod1)
> Fold: 2 1 7 9 6 4 10 5 8 3
> Internal estimate of accuracy = NA
> Cross-validation estimate of accuracy = NA
> Am I getting this result because I am working with a saturated model?
> How is the way to model this type of data (1 categorical response
> variable and 6 explanatory factors)?
> I also used classification trees for the data but the error is bigger
> after the first split.
> Manuel Sp?nola, Ph.D.
> Instituto Internacional en Conservaci?n y Manejo de Vida Silvestre
> Universidad Nacional
> Apartado 1350-3000
> COSTA RICA
> mspinola at una.ac.cr
> mspinola10 at gmail.com
> Tel?fono: (506) 2277-3598
> Fax: (506) 2237-7036
More information about the R-sig-ecology