[R] Packages for Learning Algorithm Independent Branch and Bound for Feature Selection

Enrico Schumann es at enricoschumann.net
Mon Jul 3 09:49:01 CEST 2017


Quoting Alex Byrley <anbyrley at buffalo.edu>:

> See, I have built my own genetic algorithm already and tested it on this
> problem. I have a solution, but due to the heuristic nature of GA, I cannot
> guarantee that it is the optimal subset.
>
> If I was simply doing this for a company project, you are spot on with the
> type of algorithm I would use, but I am doing this for a scientific paper.
> I need to be able to find the optimal subset over my dataset, and I know
> branch and bound will find it without resorting to exhaustive search. If I
> can't claim that my subset is optimal, it is going to open my paper up to
> serious enough criticism that it will get rejected, regardless of whether
> my new method outperforms the state-of-the-art or not. (And regardless if
> my dataset is representative enough to make such performance claims)

Just a comment:

A heuristic such as a Genetic Algorithm does indeed not guarantee the
optimum [*], but will only provide a stochastic approximation. This
approximation will get better if the algorithm gets more computing time;
but still, the solution will remain random.

However, people who read scientific papers can deal with such randomness,
as long as it is properly explained and reported. (Otherwise, no results
that relied on stochastic methods, such as MCMC, would ever have
been published.)

Some years ago, we wrote a short article on this topic (though on a
financial application):
https://link.springer.com/article/10.1007/s10732-010-9138-y

Kind regards (and good luck)
     Enrico Schumann


[*] Which does not mean it has not found it.


> I will take a look at that page, thanks! Hopefully there is an R
> implementation of generic B&B as I described out there somewhere...
>
> Alex Byrley
> Graduate Student
> Department of Electrical Engineering
> 235 Davis Hall
> (716) 341-1802
>
> 2017-07-01 3:53 GMT-04:00 Enrico Schumann <es at enricoschumann.net>:
>
>> On Thu, 29 Jun 2017, Alex Byrley writes:
>>
>> > I am looking for packages that can run a branch-and-bound algorithm to
>> > maximize a distance measure (such as Bhattacharyya or Mahalanobis) on a
>> set
>> > of features.
>> >
>> > I would like this to be learning algorithm independent, so that the
>> method
>> > just looks at the features, and selects the subset of a user-defined size
>> > that maximizes a distance criteria such as those stated above.
>> >
>> > Can anyone give some suggestions?
>> >
>> > Alex Byrley
>> > Graduate Student
>> > Department of Electrical Engineering
>> > 235 Davis Hall
>> > (716) 341-1802
>> >
>>
>> It seems you are looking for a generic optimisation
>> algorithm; so perhaps start at the task view:
>> https://cran.r-project.org/web/views/Optimization.html
>>
>> What you describe is a combinatorial problem: select k
>> from N features, with k (much) smaller than N. So I'd
>> suggest to also look at heuristic algorithms that can
>> deal with such problems (e.g. genetic algorithms).
>>
>>
>> --
>> Enrico Schumann
>> Lucerne, Switzerland
>> http://enricoschumann.net
>>



-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net



More information about the R-help mailing list