[R] need help for superpc package

Rossana Dell'Anna dellanna at itc.it
Wed Apr 19 16:17:26 CEST 2006


Hi,

I am using the superpc package.
By 
superpc.train (data, type="regression") 
I calculated the standardized regression coefficients for measuring the
univariate effect of each feature on a continuous response y.

By 
superpc.cv(compute.fullcv=TRUE, compute.preval=FALSE, n.components=3,
n.fold=10)  
I used cross validation to estimate the optimal feature threshold and
choose only those features whose univariate coefficient exceeds the
threshold.

To choose the best threshold I plotted the cross-validation curves by 
superpc.plotcv(cv.type="full').

I work with 55 features and 43 samples.
I noticed that by repeating the call of superpc.cv  (with the same
argument values) and then plotting the curves by  superpc.plotcv, the
curves change and sometimes the best threshold changes value. Please
note that I chose compute.fullcv =TRUE, therefore full cross-validation
is done.
Moreover, if n.fold=10 the result is not significant for any threshold
value (the three curves are under the three horizontal lines -likelihood
ratio test). However with n.fold<10 there are significant results.

I tested also the example provided by the user manual for the
explanation of the superpc.plotcv routine. In this case, by repeating
the call of superpc.cv (with the same argument values) the three curves
change, but they always remain under the three horizontal lines. So it
seems that the result is always not significant (likelihood ratio).

Is my interpretation of the likelihood ratio test correct? 
As the curves change, has someone any explanation or solution for the
correct choice of the best threshold value? 

I also noticed that by following the instructions of the superpc
tutorial (http://www-stat.stanford.edu/~tibs/superpc/tutorial.html) the
obtained cv.plot is not similar to the one provided in the tutorial as
pdf version. I use R 2.2.1 version. Is it possible that the rnorm()
function (seed=4648) did not generate the same set of numbers as that
used by R. Tibshirani?

Thank You in advance

Rossana

-----------------------------------------------
Rossana Dell'Anna, PhD
ITC-irst Centre for Scientific and Technological Research 
Via Sommarive, 18 - 38050 Trento, ITALY
ph:  +39 0461 314 486
e-mail:  dellanna at itc.it




More information about the R-help mailing list