[R] Help on variable ranking
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Wed Jan 17 13:50:27 CET 2007
Rupendra Chulyadyo wrote:
> Hello all,
>
> I want to assign relative score to the predictor variables on the basis of
> its influence on the dependent variable. But I could not find any standard
> statistical approach appropriate for this purpose.
> Please suggest the possible approaches.
>
> Thanks in advance,
>
> Rupendra Chulyadyo
> Institute of Engineering,
> Tribhuvan University,
> Nepal
You might consider using the bootstrap to get confidence intervals of
the rank of each predictor's partial chi-square or partial R-square.
The following takes into account all terms that might be associated with
a variable (nonlinear and interaction terms, dummy variables). The code
is taken from an example in the anova.Design help file in the Design
package. Unless the dataset is huge and there is little collinearity,
you will be surprised how difficult it is to pick winners and losers
from the predictors [try ranking gene expressions from gene microarray
data for even more of a shock]. Note that Bayesian ranking procedures
are more accurate, but this quick and dirty approach isn't bad.
mydata <- data.frame(x1=runif(200), x2=runif(200),
sex=factor(sample(c('female','male'),200,TRUE)))
set.seed(9) # so can reproduce example
mydata$y <- ifelse(runif(200)<=plogis(mydata$x1-.5 + .5*(mydata$x2-.5) +
.5*(mydata$sex=='male')),1,0)
library(Design)
library(boot)
b <- boot(mydata, function(data, i, ...) rank(-plot(anova(
lrm(y ~ rcs(x1,4)+pol(x2,2)+sex,data,subset=i)),
sort='none', pl=FALSE)),
R=25) # should really do R=500 but will take a while
Rank <- b$t0
lim <- t(apply(b$t, 2, quantile, probs=c(.025,.975)))
# Use the Hmisc Dotplot function to display ranks and their confidence
# intervals. Sort the categories by descending adj. chi-square, for ranks
original.chisq <- plot(anova(lrm(y ~ rcs(x1,4)+pol(x2,2)+sex,data=mydata)),
sort='none', pl=FALSE)
predictor <- as.factor(names(original.chisq))
predictor <- reorder.factor(predictor, -original.chisq)
Dotplot(predictor ~ Cbind(Rank, lim), pch=3, xlab='Rank',
main=expression(paste(
'Ranks and 0.95 Confidence Limits for ',chi^2,' - d.f.')))
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list