[R] Ordering Duplicates for Selection
jim holtman
jholtman at gmail.com
Tue Oct 5 17:58:47 CEST 2010
Here is a way of putting "Order" on your data:
> x
V1 V2 V3 V4 V5
1 1 12345678 Soc101 34 02-04-2003
2 2 12345678 Soc101 62 31-11-2004
3 3 12345678 Psy104 63 03-05-2003
4 4 23456789 Soc101 73 02-04-2003
5 5 23456789 Psy104 76 25-02-2004
> x$order <- ave(x$V1, x$V2, x$V3, FUN=seq_along)
> x
V1 V2 V3 V4 V5 order
1 1 12345678 Soc101 34 02-04-2003 1
2 2 12345678 Soc101 62 31-11-2004 2
3 3 12345678 Psy104 63 03-05-2003 1
4 4 23456789 Soc101 73 02-04-2003 1
5 5 23456789 Psy104 76 25-02-2004 1
>
On Tue, Oct 5, 2010 at 11:42 AM, C C <psdcc at hotmail.com> wrote:
>
> Hi all,
>
> I've found a lot of helpful info regarding identifying and deleting duplicates but I'd like to do something a little different - I'd like to identify the duplicate values but instead of deletion, label them with a value.
>
> I am working with historical data regarding school courses:
>
>
>
> Student Number Course Final Mark Completed
> Date
>
> 1 12345678 Soc101 34 02-04-2003
>
> 2 12345678 Soc101 62 31-11-2004
>
> 3 12345678 Psy104 63 03-05-2003
>
> 4 23456789 Soc101 73 02-04-2003
>
> 5 23456789 Psy104 76 25-02-2004
>
>
> In this data frame, records 1 and 2 contain data for the same student taking the same course. In record 1, the student failed (Final Mark), took the course again (Completed Date) and finally passed (Final Mark) in record 2.
>
> I'd like to be able to work with the data so that I could summarize the achievement distribution for the first attempt records and then compare it to the achievement distribution for the second attempt records. In Excel I'd use something like COUNTIF($A$2:A2,A2) in a new column and then summarize the "1" values and "2" values.
>
> Order Student Number Course Final Mark Completed Date
>
> 1 1 12345678 Soc101 34 02-04-2003
>
> 2 2 12345678 Soc101 62 31-11-2004
>
> 3 1 12345678 Psy104 63 03-05-2003
>
> 4 1 23456789 Soc101 73 02-04-2003
>
> 5 1 23456789 Psy104 76 25-02-2004
>
>
> I suspect the answer is in the list discussions on "deleting duplicate records" but I'm still familiarizing myself with R and I'm not at a point to be able to see how it could be modified. Any thoughts?
>
> Cheers,
> Chris
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list