[R] [R-pkgs] Subselect package - Version 0.7.1

Jorge Cadima jcadima at isa.utl.pt
Thu Mar 11 13:11:04 CET 2004


A new  version (0.7.1) of package 'subselect' has been uploaded to CRAN.

Package 'subselect' provides functions which assess the quality of
variable subsets as surrogates for a full data set, in an exploratory
data analysis, and search for subsets which are optimal under various
criteria. 

As of version 0.7 a new function 'leaps' has been added. 'Leaps'
performs a branch and bound search for the best variable subsets,
according to a specified criterion. 'Leaps' implements Duarte Silva's
adaptation (Reference 3) of Furnival and Wilson's Leaps and Bounds
Algorithm for variable selection in Regression Analysis. It is viable in
identifying optimal subsets for data sets with a moderate number
(up to about 30-35) of variables, and very fast for small data sets
(up to about 20-25 variables).

In package subselect, the quality of given k-subsets of variables are
assessed under three criteria (Reference 2). 

Three additional functions, 'anneal', 'genetic' and 'improve', search for
optimal k-variable subsets under those criteria, using three different
algorithms: a simulated annealing algorithm, a genetic algorithm and a
restricted local improvement algorithm (Reference 1). Among the
options, the user can control number of iterations, initial
temperature, cooling factors and cooling frequency in simulated
annealing, and number of generations, population size, admissibility
of clones and presence and frequency of mutations in the 
genetic algorithm. 

For all algorithms, it is possible to specify the number
of solutions required in one or more cardinalities and to force the solutions 
to include and/or to exclude given subsets of variables.


Here is the DESCRIPTION file for the package:

Package: subselect
Version: 0.7.1
Date: 2004/03/10
Title: Selecting variable subsets.
Author: Jorge Orestes Cerdeira <orestes at isa.utl.pt> Pedro Duarte Silva
        <psilva at porto.ucp.pt> Jorge Cadima <jcadima at isa.utl.pt> Manuel
        Minhoto <minhoto at uevora.pt>
Maintainer: Jorge Cadima <jcadima at isa.utl.pt>
Description: A collection of functions which assess the quality of
        variable subsets as surrogates for a full data set, in an
        exploratory data analysis, and search for subsets which are
        optimal under various criteria.
License: GPL


There is a CHANGELOG file in subdirectory 'inst' documenting changes
since Version 0.1.



BIBLIOGRAPHY:

1) Cadima, J., Cerdeira, J. Orestes and Minhoto, M. (2004)
Computational aspects of algorithms for variable selection in the
context of principal components. To appear in 
_Computational Statistics & Data Analysis_ (Special Issue on
Applications of Optimization Heuristics to Estimation and Modelling Problems).

2) Cadima, J. and Jolliffe, I.T. (2001). Variable Selection and
the Interpretation of Principal Subspaces, _Journal of
Agricultural, Biological and Environmental Statistics_, Vol. 6, 62-79.

3) Duarte Silva, A.P. (2002) Discarding Variables in a Principal 
Component Analysis: Algorithms for All-Subsets Comparisons,
_Computational Statistics_, Vol. 17, 251-271.

_______________________________________________
R-packages mailing list
R-packages at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/r-packages




More information about the R-help mailing list