[R] highest and second highest value in row for each combination
Phil Spector
spector at stat.berkeley.edu
Thu Feb 10 18:55:03 CET 2011
Alain -
Here's a reproducible data set:
set.seed(19)
area<-c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10))
type<-c(rep(1:10,5))
a<-rnorm(50)
b<-rnorm(50)
c<-rnorm(50)
d<-rnorm(50)
df<-cbind(area,type,a,b,c,d)
First I'll make a helper function to operate on one
row of the data frame:
get2 = function(x){
y = x[-c(1,2)]
oy = order(y,decreasing=TRUE)
nms = colnames(df)[-c(1,2)]
data.frame(area=rep(x[1],2),type=rep(x[2],2),
max=y[oy[1:2]],colname=nms[oy[1:2]])
}
Now I can use apply, do.call and rbind to get the answer:
> answer = do.call(rbind,apply(df,1,get2))
> head(answer)
area type max colname
b 1 1 1.7036697 b
c 1 1 0.7910130 c
c1 1 2 2.4576579 c
a 1 2 0.3885812 a
c2 1 3 1.2363598 c
a1 1 3 -0.3443333 a
(My numbers differ from yours because you didn't specify
a seed for the random number generator)
I'm not exactly sure how to form your column "combination", though.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Thu, 10 Feb 2011, Alain D. wrote:
> Dear R-List,
>
> I have a dataframe
>
> area<-c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10))
> type<-c(rep(1:10,5))
> a<-rnorm(50)
> b<-rnorm(50)
> c<-rnorm(50)
> d<-rnorm(50)
> df<-cbind(area,type,a,b,c,d)
>
>
> df
> area type a b
> c d
> [1,] 1 1 0.45608192 0.240378547 2.05208079 -1.18827462
> [2,] 1 2 -0.12119506 -0.028078577 -2.64323695 -0.83923441
> [3,] 1 3 0.09066133 -1.134069619 1.53344812 -0.15670239
> [4,] 1 4 -1.34505241 1.919941172 -1.02090099 0.75664358
> [5,] 1 5 -0.29279617 -0.314955019 -0.88809266 2.22282022
> [6,] 1 6 -0.59697893 -0.652937746 1.05132400 -0.02469151
> [7,] 1 7 -1.18199400 0.728165962 -1.51419348 0.65640976
> [8,] 1 8 -0.72925659 0.303514237 0.79758488 0.93444350
> [9,] 1 9 -1.60080508 -0.187562633 0.51288428 -0.55692877
> [10,] 1 10 0.54373268 -0.494994392 0.52902381 1.12938122
> [11,] 2 1 -1.29675664 -0.644990784 -2.44067511 -0.18489544
> [12,] 2 2 0.86330699 1.458038882 1.17514710 1.32896878
> [13,] 2 3 0.30069402 1.361211939 0.84757211 1.14502761
> ...
>
> Now I want to have for each combination of area and type the name and
> corresponding value of the two columns with the highest and second highest
> value a,b,c,d.
> In the above example it should be something like
>
> combination max colname
> 11 2.05 c
> 11 0.46 a
> 12 -0.03 b
> 12 -0.12 a
> ...
>
> (It might be arranged differently, though)
>
> Can anyone help?
>
> Thank you in advance!
>
> Alain
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list