[R] matching observations and ranking
arun
smartpink111 at yahoo.com
Wed Apr 24 07:32:05 CEST 2013
Hi,
It is not that clear.
If VAR1 is a match between columns AB001A, AB0002A, VAR2 between AB001A, AB362 and VAR3 between AB0002A and AB362:
Also, I assume row8 match would be taken as 1.
dat1<- read.table(text="
S.No AB001A AB0002A AB362
1 -/- C/C A/A
2 C/C C/C A/A
3 C/C C/C A/A
4 C/C C/C A/A
5 C/C C/C A/A
6 C/C C/C A/A
7 C/C C/C A/A
8 -/- -/- -/-
9 C/C C/C A/A
10 C/C C/C A/A
11 -/- C/C A/A
12 C/C C/C A/A
13 C/C C/C A/A
14 C/C C/C A/A
16 C/C -/- A/A
17 -/- C/C A/A
18 C/C C/C A/A
19 C/C C/C A/A
",sep="",header=TRUE,stringsAsFactors=FALSE)
library(plyr)
res<-mutate(dat1,VAR1=1*(AB001A==AB0002A),VAR2=1*(AB001A==AB362),VAR3=1*(AB0002A==AB362),SUM=rowSums(cbind(VAR1,VAR2,VAR3)),MATCH=(SUM/3)*100,Rank=rank(MATCH)
head(res)
# S.No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM MATCH Rank
#1 1 -/- C/C A/A 0 0 0 0 0.00000 2.5
#2 2 C/C C/C A/A 1 0 0 1 33.33333 11.0
#3 3 C/C C/C A/A 1 0 0 1 33.33333 11.0
#4 4 C/C C/C A/A 1 0 0 1 33.33333 11.0
#5 5 C/C C/C A/A 1 0 0 1 33.33333 11.0
#6 6 C/C C/C A/A 1 0 0 1 33.33333 11.0
#or
res<-mutate(dat1,VAR1=1*(AB001A==AB0002A),VAR2=1*(AB001A==AB362),VAR3=1*(AB0002A==AB362),SUM=rowSums(cbind(VAR1,VAR2,VAR3)),MATCH=(SUM/3)*100,Rank=rank(MATCH,ties.method="min"))
head(res)
# S.No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM MATCH Rank
#1 1 -/- C/C A/A 0 0 0 0 0.00000 1
#2 2 C/C C/C A/A 1 0 0 1 33.33333 5
#3 3 C/C C/C A/A 1 0 0 1 33.33333 5
#4 4 C/C C/C A/A 1 0 0 1 33.33333 5
#5 5 C/C C/C A/A 1 0 0 1 33.33333 5
#6 6 C/C C/C A/A 1 0 0 1 33.33333 5
A.K.
>Hi to all bloggers,
>my data looks like this,
>
>S. No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM %Match Rank
> 1 -/- C/C A/A
> 2 C/C C/C A/A
> 3 C/C C/C A/A
> 4 C/C C/C A/A
> 5 C/C C/C A/A
> 6 C/C C/C A/A
> 7 C/C C/C A/A
> 8 -/- -/- -/-
> 9 C/C C/C A/A
> 10 C/C C/C A/A
> 11 -/- C/C A/A
> 12 C/C C/C A/A
> 13 C/C C/C A/A
> 14 C/C C/C A/A
> 16 C/C -/- A/A
> 17 -/- C/C A/A
> 18 C/C C/C A/A
> 19 C/C C/C A/A
>I want to match obs 3 with obs 2 if it exactly matched then score
will be 1 else 0, that will be stored in var1 for AB001a, in var2 for
ab0002a and in >var3 for ab362 and i want to calculate sum of all the 1's
and observation match percent and their rank (top ten matchers), I did
this successfully in >excel but it took me lot of time, i used if
condition in excel like (=if(A3=A$2,1,0) and then i dragged among all
obs and i did sum of all obs, their >%match and rank. My question is how
can i do this in R? can i use match package for this? or other packages
will help me? my data is so big with >5,15,567 obs. can any one guide me
how to do this in sas because i want to reduce my time to analyze my
data. Thanking you Regards,
More information about the R-help
mailing list