[R] select duplicate identifier with higher mean across sample columns

Sun Nov 4 20:39:12 CET 2012

Is this what you want:

> mdf <- read.table(text = "  id samp1 samp2 samp2a
+ 1  A   100   110    110
+ 2  A   120   130    150
+ 3  C   101   131    151
+ 4  D   110   150    130
+ 5  E   132   122    122
+ 6  F   123   143    143", header = TRUE)
> result <- do.call(rbind, lapply(split(mdf, mdf$id), function(.id){
+     maxIndx <- which.max(rowMeans(.id[, -1L]))
+     .id[maxIndx, ]
+ }))
>
> result
  id samp1 samp2 samp2a
A  A   120   130    150
C  C   101   131    151
D  D   110   150    130
E  E   132   122    122
F  F   123   143    143

On Sun, Nov 4, 2012 at 2:25 PM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:
> Hi Group:
> I searched R groups before posting this question. I could not find the
> appropriate answer and I do not have clear understanding how to do
> this in R.
>
> I have a data frame with duplicated row identifiers but with different
> values across columns. I want to select the identifier with higher
> inter-quartile range or mean.
>
>
>  id <- c("A", "A", "C", "D", "E", "F")
>  year <- c(2000, 2001, 2001, 2002, 2003, 2004)
>  samp1 <- c(100, 120, 101, 110, 132,123)
>  samp2 <- c(110, 130, 131, 150, 122,143)
>  mdf <- data.frame(id,samp1,samp2,samp2a)
>
>
>> mdf
>   id samp1 samp2 samp2a
> 1  A   100   110    110
> 2  A   120   130    150
> 3  C   101   131    151
> 4  D   110   150    130
> 5  E   132   122    122
> 6  F   123   143    143
>
>
> There are two A ids in this df. I want to select the row with higher mean.
>
> How can I do this.
> Thanks
> Adrian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.