# [R] matching observations and ranking

arun smartpink111 at yahoo.com
Wed Apr 24 16:09:08 CEST 2013

```Hi,
May be this helps:
As you wanted to match only from row3 onwards to row2, the corresponding values on row1 and row2 were set to NA.
S.No AB001A AB0002A AB362
P1   -/-        C/C   A/A
P2   C/C        C/C   A/A
3   C/C        C/C   A/A
4   C/C        C/C   A/A
5   C/C        C/C   A/A
6   C/C        C/C   A/A
7   C/C        C/C   A/A
8   -/-        -/-   -/-
9   C/C        C/C   A/A
10  C/C        C/C   A/A
11  -/-        C/C   A/A
12  C/C        C/C   A/A
13  C/C        C/C   A/A
14  C/C        C/C   A/A
15  C/C        -/-   A/A
16   -/-        C/C   A/A
17   A/A        A/C   A/A
18  C/A        A/A   A/A
dat2<-cbind(dat1,(1*mapply("==",dat1[,-1],dat1[2,-1])))
names(dat2)[duplicated(names(dat2))]<- paste0(names(dat2)[duplicated(names(dat2))],"_1")
library(plyr)
dat3<-mutate(dat2,SUM=rowSums(cbind(AB001A_1,AB0002A_1,AB362_1)), MATCH=(SUM/3)*100)
dat3[1:2,5:9]<-NA
res<-mutate(dat3,RANK=rank(MATCH,ties.method="min"))
#  S.No AB001A AB0002A AB362 AB001A_1 AB0002A_1 AB362_1 SUM MATCH RANK
#1   P1    -/-     C/C   A/A       NA        NA      NA  NA    NA   17
#2   P2    C/C     C/C   A/A       NA        NA      NA  NA    NA   18
#3    3    C/C     C/C   A/A        1         1       1   3   100    7
#4    4    C/C     C/C   A/A        1         1       1   3   100    7
#5    5    C/C     C/C   A/A        1         1       1   3   100    7
#6    6    C/C     C/C   A/A        1         1       1   3   100    7
A.K.

>Hi Arun,
>Thank you very much for your help in solving my problem,
>S. No   AB001A  AB0002A AB362   AB001A    AB0002A     AB362   SUM %Match  Rank
>   P1   -/-        C/C   A/A
> P 2   C/C        C/C   A/A
>  3   C/C        C/C   A/A
>  4   C/C        C/C   A/A
>  5   C/C        C/C   A/A
> 6   C/C        C/C   A/A
> 7   C/C        C/C   A/A
> 8   -/-        -/-   -/-
> 9   C/C        C/C   A/A
>10  C/C        C/C   A/A
> 11  -/-        C/C   A/A
> 12  C/C        C/C   A/A
> 13  C/C        C/C   A/A
> 14  C/C        C/C   A/A
>16  C/C        -/-   A/A
>Actually i want to match observation from 3 to 16 with the value in
p2 (i.e 3 with p2, 4 with p2, 5 with p2 etc), if they match i would like
to give >value 1 and store it in corresponding dummy variable i.e.
AB001A and i would like to do samething for remaining vars too and
storing in their >dummy vars. Finally i want make sum of all the matched
(i.e. 1 score) in each row and calculate percentage of match and then
rank. This what i >want, sorry for not expressing my problem exactly in
understandable way.

>Hi to all bloggers,
>my data looks like this,
>
>S. No   AB001A  AB0002A AB362   VAR1    VAR2    VAR3    SUM %Match  Rank
>  1   -/-        C/C   A/A
>   2   C/C        C/C   A/A
>  3   C/C        C/C   A/A
>  4   C/C        C/C   A/A
> 5   C/C        C/C   A/A
>  6   C/C        C/C   A/A
> 7   C/C        C/C   A/A
> 8   -/-        -/-   -/-
> 9   C/C        C/C   A/A
> 10  C/C        C/C   A/A
> 11  -/-        C/C   A/A
> 12  C/C        C/C   A/A
> 13  C/C        C/C   A/A
> 14  C/C        C/C   A/A
> 16  C/C        -/-   A/A
> 17   -/-        C/C   A/A
> 18   C/C        C/C   A/A
> 19  C/C        C/C   A/A
>I want to match obs 3 with obs 2 if it exactly matched then score
will be 1 else 0, that will be stored in var1 for AB001a, in var2 for
ab0002a and in >var3 for ab362 and i want to calculate sum of all the 1's
and observation match percent and their rank (top ten matchers), I did
this successfully in >excel but it took me lot of time, i used if
condition in excel like (=if(A3=A\$2,1,0) and then i dragged among all
obs and i did sum of all obs, their >%match and rank. My question is how
can i do this in R? can i use match package for this? or other packages
will help me? my data is so big with >5,15,567 obs. can any one guide me
how to do this in sas because i want to reduce my time to analyze my
data. Thanking you Regards,

```