[R] select duplicate identifier with higher mean across sample columns

Mon Nov 5 16:47:48 CET 2012

Thanks a lot for the help.
-Adrian

On Sun, Nov 4, 2012 at 2:39 PM, jim holtman <jholtman at gmail.com> wrote:
> Is this what you want:
>
>> mdf <- read.table(text = "  id samp1 samp2 samp2a
> + 1  A   100   110    110
> + 2  A   120   130    150
> + 3  C   101   131    151
> + 4  D   110   150    130
> + 5  E   132   122    122
> + 6  F   123   143    143", header = TRUE)
>> result <- do.call(rbind, lapply(split(mdf, mdf$id), function(.id){
> +     maxIndx <- which.max(rowMeans(.id[, -1L]))
> +     .id[maxIndx, ]
> + }))
>>
>> result
>   id samp1 samp2 samp2a
> A  A   120   130    150
> C  C   101   131    151
> D  D   110   150    130
> E  E   132   122    122
> F  F   123   143    143
>
>
> On Sun, Nov 4, 2012 at 2:25 PM, Adrian Johnson
> <oriolebaltimore at gmail.com> wrote:
>> Hi Group:
>> I searched R groups before posting this question. I could not find the
>> appropriate answer and I do not have clear understanding how to do
>> this in R.
>>
>> I have a data frame with duplicated row identifiers but with different
>> values across columns. I want to select the identifier with higher
>> inter-quartile range or mean.
>>
>>
>>  id <- c("A", "A", "C", "D", "E", "F")
>>  year <- c(2000, 2001, 2001, 2002, 2003, 2004)
>>  samp1 <- c(100, 120, 101, 110, 132,123)
>>  samp2 <- c(110, 130, 131, 150, 122,143)
>>  mdf <- data.frame(id,samp1,samp2,samp2a)
>>
>>
>>> mdf
>>   id samp1 samp2 samp2a
>> 1  A   100   110    110
>> 2  A   120   130    150
>> 3  C   101   131    151
>> 4  D   110   150    130
>> 5  E   132   122    122
>> 6  F   123   143    143
>>
>>
>> There are two A ids in this df. I want to select the row with higher mean.
>>
>> How can I do this.
>> Thanks
>> Adrian
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.