[R] select duplicate identifier with higher mean across sample columns
jim holtman
jholtman at gmail.com
Sun Nov 4 20:39:12 CET 2012
Is this what you want:
> mdf <- read.table(text = " id samp1 samp2 samp2a
+ 1 A 100 110 110
+ 2 A 120 130 150
+ 3 C 101 131 151
+ 4 D 110 150 130
+ 5 E 132 122 122
+ 6 F 123 143 143", header = TRUE)
> result <- do.call(rbind, lapply(split(mdf, mdf$id), function(.id){
+ maxIndx <- which.max(rowMeans(.id[, -1L]))
+ .id[maxIndx, ]
+ }))
>
> result
id samp1 samp2 samp2a
A A 120 130 150
C C 101 131 151
D D 110 150 130
E E 132 122 122
F F 123 143 143
On Sun, Nov 4, 2012 at 2:25 PM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:
> Hi Group:
> I searched R groups before posting this question. I could not find the
> appropriate answer and I do not have clear understanding how to do
> this in R.
>
> I have a data frame with duplicated row identifiers but with different
> values across columns. I want to select the identifier with higher
> inter-quartile range or mean.
>
>
> id <- c("A", "A", "C", "D", "E", "F")
> year <- c(2000, 2001, 2001, 2002, 2003, 2004)
> samp1 <- c(100, 120, 101, 110, 132,123)
> samp2 <- c(110, 130, 131, 150, 122,143)
> mdf <- data.frame(id,samp1,samp2,samp2a)
>
>
>> mdf
> id samp1 samp2 samp2a
> 1 A 100 110 110
> 2 A 120 130 150
> 3 C 101 131 151
> 4 D 110 150 130
> 5 E 132 122 122
> 6 F 123 143 143
>
>
> There are two A ids in this df. I want to select the row with higher mean.
>
> How can I do this.
> Thanks
> Adrian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
More information about the R-help
mailing list