[R] matching observations and ranking
arun
smartpink111 at yahoo.com
Wed Apr 24 18:02:50 CEST 2013
Just to add:
If your original dataset have only few columns, then you can try this too:
res1<-within(mutate(dat1,AB001A_1=1*(AB001A==AB001A[2]),AB0002A_1=1*(AB0002A==AB0002A[2]),AB362_1=1*(AB362==AB362[2]),SUM=rowSums(cbind(AB001A_1,AB0002A_1,AB362_1)),MATCH=(SUM/3)*100),{MATCH[1:2]<-NA;RANK=rank(MATCH,ties.method="min");SUM[1:2]<-NA;AB001A_1[1:2]<-NA;AB0002A_1[1:2]<-NA;AB362_1[1:2]<-NA})
identical(res,res1)
#[1] TRUE
A.K.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc:
Sent: Wednesday, April 24, 2013 10:09 AM
Subject: Re: matching observations and ranking
Hi,
May be this helps:
As you wanted to match only from row3 onwards to row2, the corresponding values on row1 and row2 were set to NA.
dat1<- read.table(text="
S.No AB001A AB0002A AB362
P1 -/- C/C A/A
P2 C/C C/C A/A
3 C/C C/C A/A
4 C/C C/C A/A
5 C/C C/C A/A
6 C/C C/C A/A
7 C/C C/C A/A
8 -/- -/- -/-
9 C/C C/C A/A
10 C/C C/C A/A
11 -/- C/C A/A
12 C/C C/C A/A
13 C/C C/C A/A
14 C/C C/C A/A
15 C/C -/- A/A
16 -/- C/C A/A
17 A/A A/C A/A
18 C/A A/A A/A
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-cbind(dat1,(1*mapply("==",dat1[,-1],dat1[2,-1])))
names(dat2)[duplicated(names(dat2))]<- paste0(names(dat2)[duplicated(names(dat2))],"_1")
library(plyr)
dat3<-mutate(dat2,SUM=rowSums(cbind(AB001A_1,AB0002A_1,AB362_1)), MATCH=(SUM/3)*100)
dat3[1:2,5:9]<-NA
res<-mutate(dat3,RANK=rank(MATCH,ties.method="min"))
head(res)
# S.No AB001A AB0002A AB362 AB001A_1 AB0002A_1 AB362_1 SUM MATCH RANK
#1 P1 -/- C/C A/A NA NA NA NA NA 17
#2 P2 C/C C/C A/A NA NA NA NA NA 18
#3 3 C/C C/C A/A 1 1 1 3 100 7
#4 4 C/C C/C A/A 1 1 1 3 100 7
#5 5 C/C C/C A/A 1 1 1 3 100 7
#6 6 C/C C/C A/A 1 1 1 3 100 7
A.K.
>Hi Arun,
>Thank you very much for your help in solving my problem,
>S. No AB001A AB0002A AB362 AB001A AB0002A AB362 SUM %Match Rank
> P1 -/- C/C A/A
> P 2 C/C C/C A/A
> 3 C/C C/C A/A
> 4 C/C C/C A/A
> 5 C/C C/C A/A
> 6 C/C C/C A/A
> 7 C/C C/C A/A
> 8 -/- -/- -/-
> 9 C/C C/C A/A
>10 C/C C/C A/A
> 11 -/- C/C A/A
> 12 C/C C/C A/A
> 13 C/C C/C A/A
> 14 C/C C/C A/A
>16 C/C -/- A/A
>Actually i want to match observation from 3 to 16 with the value in
p2 (i.e 3 with p2, 4 with p2, 5 with p2 etc), if they match i would like
to give >value 1 and store it in corresponding dummy variable i.e.
AB001A and i would like to do samething for remaining vars too and
storing in their >dummy vars. Finally i want make sum of all the matched
(i.e. 1 score) in each row and calculate percentage of match and then
rank. This what i >want, sorry for not expressing my problem exactly in
understandable way.
>Hi to all bloggers,
>my data looks like this,
>
>S. No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM %Match Rank
> 1 -/- C/C A/A
> 2 C/C C/C A/A
> 3 C/C C/C A/A
> 4 C/C C/C A/A
> 5 C/C C/C A/A
> 6 C/C C/C A/A
> 7 C/C C/C A/A
> 8 -/- -/- -/-
> 9 C/C C/C A/A
> 10 C/C C/C A/A
> 11 -/- C/C A/A
> 12 C/C C/C A/A
> 13 C/C C/C A/A
> 14 C/C C/C A/A
> 16 C/C -/- A/A
> 17 -/- C/C A/A
> 18 C/C C/C A/A
> 19 C/C C/C A/A
>I want to match obs 3 with obs 2 if it exactly matched then score
will be 1 else 0, that will be stored in var1 for AB001a, in var2 for
ab0002a and in >var3 for ab362 and i want to calculate sum of all the 1's
and observation match percent and their rank (top ten matchers), I did
this successfully in >excel but it took me lot of time, i used if
condition in excel like (=if(A3=A$2,1,0) and then i dragged among all
obs and i did sum of all obs, their >%match and rank. My question is how
can i do this in R? can i use match package for this? or other packages
will help me? my data is so big with >5,15,567 obs. can any one guide me
how to do this in sas because i want to reduce my time to analyze my
data. Thanking you Regards,
More information about the R-help
mailing list