[R] select rows by criteria
Rui Barradas
rui1174 at sapo.pt
Thu Mar 1 19:07:08 CET 2012
Hello, again.
Petr Savicky wrote
>
> On Thu, Mar 01, 2012 at 05:42:48PM +0100, Petr Savicky wrote:
>> On Thu, Mar 01, 2012 at 04:27:45AM -0800, syrvn wrote:
>> > Hello,
>> >
>> > I am stuck with selecting the right rows from a data frame. I think the
>> > problem is rather how to select them
>> > then how to implement the R code.
>> >
>> > Consider the following data frame:
>> >
>> > df <- data.frame(ID = c(1,2,3,4,5,6,7,8,9,10), value =
>> > c(34,12,23,25,34,42,48,29,30,27))
>> >
>> > What I want to achieve is to select 7 rows (values) so that the mean
>> value
>> > of those rows are closest
>> > to the value of 35 and the remaining 3 rows (values) are closest to 45.
>> > However, each value is only
>> > allowed to be sampled once!
>>
>> Hi.
>>
>> If some 3 rows have mean close to 45, then they have sum close
>> to 3*45, so the remaining 7 rows have sum close to
>>
>> sum(df$value) - 3*45 # [1] 169
>>
>> and they have mean close to 169/7 = 24.14286. In other words,
>> the two criteria cannot be optimized together.
>>
>> For this reason, let me choose the criterion on 3 rows.
>> The closest solution may be found as follows.
>>
>> # generate all triples and compute their means
>> tripleMeans <- colMeans(combn(df$value, 3))
>>
>> # select the index of the triple with mean closest to 35
>> indClosest <- which.min(abs(tripleMeans - 35))
>
> I am sorry. There should be 45 and not 35.
>
> indClosest <- which.min(abs(tripleMeans - 45))
>
> # generate the indices, which form the closest triple in df$value
> tripleInd <- combn(1:length(df$value), 3)[, indClosest]
> tripleInd # [1] 1 6 7
>
> # check the mean of the triple
> mean(df$value[tripleInd]) # [1] 41.33333
>
> Petr Savicky.
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
There are two solutions for the 3 rows criterion, 'which.min' only finds
one, the first in the order given by 'combn'.
(And I've corrected my first post but still with an error)
# Forgot to change the index matrix
meansDist2 <- apply(inxmat2, 2, function(jnx) f(jnx, DF$value, 45))
# Two solutions
(i2 <- which(meansDist2 == min(meansDist2)))
inxmat2[, i2]
mean(DF$value[inxmat2[, i2][, 1]])
[1] 41.33333
Petr's solution and mine give the same mean value.
But use for small values of (n, k) only.
Rui Barradas
--
View this message in context: http://r.789695.n4.nabble.com/select-rows-by-criteria-tp4434812p4435760.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list