[R] Odp: Fwd: duplicates

Thu Jul 29 16:54:23 CEST 2010

Hi

rather complicated one liner assuming your data frame has name test

do.call(rbind,lapply(split(test,test$var1), function(x) 
x[which.max(x[,"var2"]),]))

Here it is in 3 lines

test.s <- split(test,test$var1) # splits data frame
result <- lapply(test.s, function(x) x[which.max(x[,"var2"]),]) # chose 
maximum value from var2 and selects corresponding row
do.call(rbind, result) # put evereything into one data frame again

There could be issues if you had NA values in var1 or var2

Regards
Petr


r-help-bounces at r-project.org napsal dne 29.07.2010 16:31:06:

> 
> 
> -- Eredeti ĂĽzenet --
> FeladĂł: DĂ©vavĂˇnyai AgamemnĂłn 
<devavanyai at citromail.hu>CĂmzett: r-
> hel at r-project.org, r-hel at r-project.orgElkĂĽldve: 2010. jĂşlius 29. 
16:29TĂˇrgy
> : duplicates
> 
>  Sorry!
> I try it again
> 
> Dear R Users!
> 
> 
> I have a dataframe with duplicatecases. Var1 duplicated by var2. 
> 
> 
> 
>  var1 var2  var3  var4  var5
> 1        4       500     1    2
>  1        3       200     2    5
>  1        8       125     1    9
>  2        2       120     2    52
>  2        6        22      1    20
> 2        9        400    1    22
> 3        1        100    2    8
> 3        2        200    5    40
>  4        8        20      1    60
> 
> I want to delete duplicate ones from var1 which have low rank at var2, 
and 
> keep that case which has highest rank at var2. I would like to keep the 
Whole 
> row (with the other variables: 
> 
> var1     var2     var3    var4    var5 
> 1          8          125     1         9
> 2          9           400    1        22
> 3          2           200    50     40
> 4           8          200     1      60
> 
>  Thanks Ag
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.