[R] row selection based on median in data frame

Nick.Ellis@csiro.au Nick.Ellis at csiro.au
Fri Apr 2 02:57:11 CEST 2004


> tmp
  row.labels        a b  c 
1          1 deadlift 7 13
2          2    squat 7 24
3          3    clean 7 10
4          4 deadlift 8  8
5          5    squat 8 20
6          6    clean 8  2
7          7 deadlift 9  5
8          8    squat 9 32
9          9    clean 9 19
> tapply(tmp$c,tmp$a,median)
 clean deadlift squat 
    10        8    24
> tmp[tapply(1:nrow(tmp),tmp$a,function(i,x) {x <- x[i]; i[x==median(x)]}, x=tmp$c),]
  row.labels        a b  c 
3          3    clean 7 10
4          4 deadlift 8  8
2          2    squat 7 24

If you have multiple grouping variables g1,g2,g3 you simply include those in the 2nd argument:

tmp[tapply(1:nrow(tmp),tmp[c("gp1","gp2","gp3")],function(i,x) {x <- x[i]; i[x==median(x)]}, x=tmp$c),]

Nick Ellis
CSIRO Marine Research	mailto:Nick.Ellis at csiro.au
PO Box 120			ph    +61 (07) 3826 7260
Cleveland QLD 4163    	fax   +61 (07) 3826 7222
Australia			http://www.marine.csiro.au
  
> 
> 
> ------------------------------
> 
> Message: 75
> Date: Wed, 31 Mar 2004 22:22:22 -0500
> From: Ed L Cashin <ecashin at uga.edu>
> Subject: [R] row selection based on median in data frame
> To: r-help at stat.math.ethz.ch
> Message-ID: <873c7otma9.fsf at uga.edu>
> Content-Type: text/plain; charset=us-ascii
> 
> Hi.  I am having trouble thinking of an easy way to grab rows out of a
> data frame.  I want to select the rows with a median value when the
> rows are similar.
> 
> A simple example is this table, which I could read into a data frame.
> I would like to find a new data frame with only the rows with a median
> value for the "c" column given a certain "a" value.
> 
> For example, the c values for deadlift rows are 13, 8, and 5, so the
> row with a c value of 8 should show up in the output.
> 
>         a          b          c
>      1	deadlift   7          13 
>      2	squat      7          24
>      3	clean      7          10
>      4	deadlift   8           8
>      5	squat      8          20
>      6	clean      8           2
>      7  deadlift   9           5
>      8  squat      9          32
>      9  clean      9          19
> 
> Result:
> 
>         a          b          c
>      4	deadlift   8           8
>      5	squat      8          20
>      3	clean      7          10
> 
> It's more complicated in my case, because I have not just one "a"
> column, but about eight columns that have to be the same.  I can do
> this with clumsy loops, but I wonder whether there's a better way.
> 
> -- 
> --Ed L Cashin            |   PGP public key:
>   ecashin at uga.edu        |   http://noserose.net/e/pgp/
>




More information about the R-help mailing list