[R] Optimization problem: selecting independent rows to maximize the mean

Wed Mar 1 22:32:30 CET 2006

Package lpSolve might help.

On 3/1/06, Mark <mtb954 at gmail.com> wrote:
> Dear R community,
>
> I have a dataframe with 500,000 rows and 102 columns. The rows
> represent spatial polygons, some of which overlap others (i.e., not
> all rows are independent of each other).
>
> Given a particular row, the first column contains a unique "RowID".
> The second column contains the "Variable" of interest. The remaining
> 100 columns ("Overlap1" ... "Overlap100") each contain a row ID that
> overlaps this row (but if this row overlaps fewer than 100 other rows
> then the remainder of the columns "OL1...OL100" contain NA).
>
> Here's the problem: I need to select the subset of 500 independent
> rows that maximizes the mean and minimizes the stdev of "Variable".
>
> Clearly this requires iterative selection and comparison of rows,
> because each newly-selected row must be compared to rows already
> selected to ensure it does not overlap them. At each step, a row
> already selected might be removed from the subset if it can be
> replaced with another that increases the mean and/or reduces the
> stdev.
>
> The above description is a simplification of my problem, but it's a start.
>
> As I am new to R (and programming in general) I'm not sure how to
> start thinking about this, or even where to look. I'd appreciate any
> ideas that might help.
>
> Thank you, Mark
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>