[R] Extracing Unique Rows based on 2 Column

Boris Steipe boris.steipe at utoronto.ca
Mon Nov 30 05:36:19 CET 2015


A logical expression applied to a vector (such as a dataframe column) gives you a logical vector that you can use for selection. You can combine several of these with the & (AND) and | (OR) operator. In your case, you apparently want a range of possible values. Use the %in% operator.

Consider eg.

orderlist$i == 2
orderlist$i == 2 & orderlist$j < 3
orderlist$i %in% c(5, 7)

Cheers,
B.


On Nov 29, 2015, at 10:55 PM, Ragia Ibrahim <ragia11 at hotmail.com> wrote:

> Dear group,
> kindly,  I have a data frame, as follows:
> 
> 
>  Measure_id i j value      rank
> 1           1 2 3   2.0 1.0000000
> 2           1 5 1   2.0 1.0000000
> 3           1 2 1   1.5 0.7500000
> 4           1 5 2   1.5 0.7500000
> 5           1 7 3   1.5 1.0000000
> 6           1 2 4   1.0 0.5000000
> 7           1 7 5   1.0 0.6666667
> 8           2 5 2   2.5 1.0000000
> 9           2 2 1   2.0 1.0000000
> 10          2 2 4   2.0 1.0000000
> ..        ... . .   ...       ...
> 
> I want to select distinct rows based on two coulmn ( Measure_id  and i )
> 
> for example for Measure_id  = 1,2  the result would be....
> 1           1 2 3   2.0 1.0000000
> 2           1 5 1   2.0 1.0000000
> 5           1 7 3   1.5 1.0000000
> 8           2 5 2   2.5 1.0000000
> 9          2 2 1   2.0 1.0000000
> 
> 
> kindly how I could do this?
> 
> example of the data frame are followed using dput.
> 
> dput(orderlist)
> 
> structure(list(Measure_id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 
> 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 
> 5, 5, 5), i = c(2, 5, 2, 5, 7, 2, 7, 5, 2, 2, 7, 2, 5, 7, 2, 
> 2, 2, 5, 5, 7, 7, 2, 5, 2, 2, 5, 7, 7, 2, 2, 5, 2, 5, 7, 7), 
>     j = c(3, 1, 1, 2, 3, 4, 5, 2, 1, 4, 5, 3, 1, 3, 1, 3, 4, 
>     1, 2, 3, 5, 4, 2, 1, 3, 1, 3, 5, 1, 4, 2, 3, 1, 3, 5), value = c(2, 
>     2, 1.5, 1.5, 1.5, 1, 1, 2.5, 2, 2, 2, 1.5, 1.5, 1, 1, 0, 
>     0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 1, 1, 1, 1), 
>     rank = c(1, 1, 0.75, 0.75, 1, 0.5, 0.666666666666667, 1, 
>     1, 1, 1, 0.75, 0.6, 0.5, 1, 0, 0, NaN, NaN, NaN, NaN, 1, 
>     1, 0, 0, 0, NaN, NaN, 1, 1, 1, 0.5, 0.5, 1, 1)), class = c("grouped_df", 
> "tbl_df", "tbl", "data.frame"), row.names = c(NA, -35L), .Names = c("Measure_id", 
> "i", "j", "value", "rank"), vars = list(Measure_id), indices = list(
>     0:6, 7:13, 14:20, 21:27, 28:34), group_sizes = c(7L, 7L, 
> 7L, 7L, 7L), biggest_group_size = 7L, labels = structure(list(
>     Measure_id = c(1, 2, 3, 4, 5)), class = "data.frame", row.names = c(NA, 
> -5L), .Names = "Measure_id", vars = list(Measure_id)))
> 
> 
> 
> 
> thanks in advance
> Ragia 		 	   		  
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list