[R] "Error in contrasts" in step wise regression

Prof Brian D Ripley ripley at stats.ox.ac.uk
Mon Jun 27 20:01:37 CEST 2005


On Mon, 27 Jun 2005, Young Cho wrote:

> Thanks for the reply. I created a new dataframe and ran step on it. But, still it does not work.
>
> > detach(dat)
> > attach(ds)
> > dat <- ds[,sapply(ds,nlevels)>=2]
> > dat$Y <- Response
> > detach(ds)
> > attach(dat)
> > fmla <- as.formula(paste(" ~ ",paste(collist1[sapply(ds,nlevels)>=2],collapse="+")))
> > fit.s <- step(fit.1, direction="forward",scope=list(upper= fmla,lower= ~1))
> Start:  AIC= -1651.18
>  Y ~ 1
> Error in "contrasts<-"(`*tmp*`, value = "contr.treatment") :
>         contrasts can be applied only to factors with 2 or more levels
> >

R does have debugging tools: please use them.

> Also, I was wondering if you know why the followings behave differently
> from the above:

Yes, as I have read the help page for step().  Have you?  It is discussed
there.

> > fit.s <- step(lm(Y~1),scope=list(upper=~.,lower=~1),)
> Start:  AIC= -1651.18
>  Y ~ 1
> > fit.s <- step(fit.1,scope=list(upper=~.,lower=~1),)
> Start:  AIC= -1651.18
>  Y ~ 1
>
> I thought "~." uses "all other variables in the data frame" according to
> "Introduciton to R."

In contexts where there is a data frame and there is no more specific
documentation, it means `all remaining variables separated by +'.

>
> -Young.
>
>
> Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Fri, 24 Jun 2005, Young Cho wrote:
>
> > Hi,
> >
> > I have a problem in getting step function work.
>
> This is not coming from step(), but (AFAIK) from model.matrix() called by
> lm(). One way to debug it is to try fitting the models directly.
>
> > I am getting the following error:
> >
> >> fit1 <- lm(Response~1)
> >> fmla <- as.formula(paste(" ~ ",paste(colnames,collapse="+")))
> >> sfit <- step(fit1,scope=list(upper= fmla,lower= ~1),k=log(nrow(dat)))
> > Start: AIC= -1646.66
> > Response ~ 1
> > Error in "contrasts<-"(`*tmp*`, value = "contr.treatment") :
> > contrasts can be applied only to factors with 2 or more levels
> >
> > But if i count the unique values in each column by
> >
> > A <- NULL
> > for (ii in 1:length(colnames)){
> > A[ii] <- length(unique( eval(parse(text=paste('dat$',colnames[ii])))))
> > }
> >
> > I do not see any column with only 1 value. Is there some other possible
> > reason why I am getting the error? Thanks a lot!
>
> It says `levels', not values. So try
>
> sapply(dat, nlevels)
>
> The values can include NA, which is not a level (usually). E.g.
>
> > x <- factor(c(1, NA))
> > nlevels(x)
> [1] 1
> > length(unique(x))
> [1] 2
>
> (Incidentally, you are assuming variables are found in dat, and you should
> use
>
> lm(Response ~ 1, data=dat)
>
> to ensure that. And your calculation can be done more legibly as
>
> sapply(dat, function(x) length(unique(x)))
>
> .)
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
>
> ---------------------------------
> Yahoo! Sports
>  Rekindle the Rivalries. Sign up for Fantasy Football




More information about the R-help mailing list