[R] matching observations and ranking

Wed Apr 24 16:09:08 CEST 2013

Hi,
May be this helps:
As you wanted to match only from row3 onwards to row2, the corresponding values on row1 and row2 were set to NA.
dat1<- read.table(text="
  S.No AB001A AB0002A AB362
   P1   -/-        C/C   A/A                       
    P2   C/C        C/C   A/A                       
    3   C/C        C/C   A/A                       
    4   C/C        C/C   A/A                       
    5   C/C        C/C   A/A                       
    6   C/C        C/C   A/A                       
    7   C/C        C/C   A/A                       
    8   -/-        -/-   -/-                       
    9   C/C        C/C   A/A                       
    10  C/C        C/C   A/A                       
    11  -/-        C/C   A/A                       
    12  C/C        C/C   A/A                       
    13  C/C        C/C   A/A                       
    14  C/C        C/C   A/A                       
    15  C/C        -/-   A/A                       
    16   -/-        C/C   A/A                       
    17   A/A        A/C   A/A                       
    18  C/A        A/A   A/A
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-cbind(dat1,(1*mapply("==",dat1[,-1],dat1[2,-1])))
names(dat2)[duplicated(names(dat2))]<- paste0(names(dat2)[duplicated(names(dat2))],"_1")
library(plyr)
 dat3<-mutate(dat2,SUM=rowSums(cbind(AB001A_1,AB0002A_1,AB362_1)), MATCH=(SUM/3)*100)
 dat3[1:2,5:9]<-NA
res<-mutate(dat3,RANK=rank(MATCH,ties.method="min"))
 head(res)
#  S.No AB001A AB0002A AB362 AB001A_1 AB0002A_1 AB362_1 SUM MATCH RANK
#1   P1    -/-     C/C   A/A       NA        NA      NA  NA    NA   17
#2   P2    C/C     C/C   A/A       NA        NA      NA  NA    NA   18
#3    3    C/C     C/C   A/A        1         1       1   3   100    7
#4    4    C/C     C/C   A/A        1         1       1   3   100    7
#5    5    C/C     C/C   A/A        1         1       1   3   100    7
#6    6    C/C     C/C   A/A        1         1       1   3   100    7
A.K.


>Hi Arun, 
>Thank you very much for your help in solving my problem, 
>S. No   AB001A  AB0002A AB362   AB001A    AB0002A     AB362   SUM %Match  Rank 
 >   P1   -/-        C/C   A/A                         
  > P 2   C/C        C/C   A/A                         
  >  3   C/C        C/C   A/A                         
  >  4   C/C        C/C   A/A                         
  >  5   C/C        C/C   A/A                         
   > 6   C/C        C/C   A/A                         
   > 7   C/C        C/C   A/A                         
   > 8   -/-        -/-   -/-                         
   > 9   C/C        C/C   A/A                         
   >10  C/C        C/C   A/A                         
   > 11  -/-        C/C   A/A                         
   > 12  C/C        C/C   A/A                         
   > 13  C/C        C/C   A/A                         
   > 14  C/C        C/C   A/A                         
    >16  C/C        -/-   A/A                         
>Actually i want to match observation from 3 to 16 with the value in 
p2 (i.e 3 with p2, 4 with p2, 5 with p2 etc), if they match i would like
 to give >value 1 and store it in corresponding dummy variable i.e. 
AB001A and i would like to do samething for remaining vars too and 
storing in their >dummy vars. Finally i want make sum of all the matched 
(i.e. 1 score) in each row and calculate percentage of match and then 
rank. This what i >want, sorry for not expressing my problem exactly in 
understandable way. 


>Hi to all bloggers, 
 >my data looks like this, 
>
>S. No   AB001A  AB0002A AB362   VAR1    VAR2    VAR3    SUM %Match  Rank 
 >  1   -/-        C/C   A/A                         
 >   2   C/C        C/C   A/A                         
  >  3   C/C        C/C   A/A                         
  >  4   C/C        C/C   A/A                         
   > 5   C/C        C/C   A/A                         
  >  6   C/C        C/C   A/A                         
   > 7   C/C        C/C   A/A                         
   > 8   -/-        -/-   -/-                         
   > 9   C/C        C/C   A/A                         
   > 10  C/C        C/C   A/A                         
   > 11  -/-        C/C   A/A                         
   > 12  C/C        C/C   A/A                         
   > 13  C/C        C/C   A/A                         
   > 14  C/C        C/C   A/A                         
   > 16  C/C        -/-   A/A                         
   > 17   -/-        C/C   A/A                         
   > 18   C/C        C/C   A/A                         
   > 19  C/C        C/C   A/A                         
>I want to match obs 3 with obs 2 if it exactly matched then score 
will be 1 else 0, that will be stored in var1 for AB001a, in var2 for 
ab0002a and in >var3 for ab362 and i want to calculate sum of all the 1's
and observation match percent and their rank (top ten matchers), I did 
this successfully in >excel but it took me lot of time, i used if 
condition in excel like (=if(A3=A$2,1,0) and then i dragged among all 
obs and i did sum of all obs, their >%match and rank. My question is how 
can i do this in R? can i use match package for this? or other packages 
will help me? my data is so big with >5,15,567 obs. can any one guide me 
how to do this in sas because i want to reduce my time to analyze my 
data. Thanking you Regards,