[R] vglm: warnings and errors

Tue Aug 2 15:40:43 CEST 2011

Hello,

I am using multinomial logit regression for the first time, and I am trying to understand the warnings and errors I get.

My data consists of 200 to 600 samples with ~25 predictors (these are principal components). The response has three categories.

I use the function "vglm" from the package VGAM, called as follows: fit1<-vglm(fmla, data=tr, multinomial,weights=regwt, maxit=500)

"regwt" are Epanechnikov weights

In general, the regression works, but 

- often, one of the categories has posterior probability zero, but the remaining two probabilities are non-zero (although very small)

- I receive many warnings of the following type:

   "in checkwz(wz, m = M , trace = trace, wzeps = control$wzepsilo): n elements replaced by 1.819e-12"

   " in tfun(mu = mu, y = y, w =w, res = FALSE, eta = eta, ...: fitted values close to 0 or 1"

   ... if I understand it correctly, these have to do with the variance of the predictions being too small?

- In some cases, I get an error: "Error in devmu[smallmu] = smy * log(smu): NAs are not allowed in subscripted arguments", sometimes this error goes away when I decrease the size of the training set.

I would like to know if this is expected behavior for some types of data sets. The manual to VGAM states that "multinomial is prone to numerical difficulties if the groups are separable and/or fitted probabilities are close to 0 or 1", but does not explain why. The latter could be my case.

I have to run the regression on 10,000s of data sets, so I would like to find a setting in which things go smoothly (i.e. without errors)

I realize that this is probably more of a methodological than technical question, but maybe you can give some rules of thumb about a  suitable number of samples/predictors or point me to some literature that would help me understand my problems.

Thanks

Anna