[R] finding duplicates in a data frame

arun smartpink111 at yahoo.com
Thu Jun 14 01:50:34 CEST 2012


Hi,

Try this:

 dat1<-data.frame(x=letters[1:5],y1=4:8,y2=sample(1:4,5,replace=T))
 dat2<-data.frame(x=letters[6:10],y1=4:8,y2=sample(1:4,5,replace=T))
 merge(dat1,dat2,by=c("y1","y2"))

A.K.



----- Original Message -----
From: sathya7priya <sathya7priya at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, June 13, 2012 6:16 AM
Subject: [R] finding duplicates in a data frame

I have two data frames which has 3 columns each.My first data frame is large
like this below
"new.col ppm.p. freq.p."
"1_3_diaminopropane 3.13859 5.67516"
"1_3_diaminopropane 3.137 6.65388"
"1_3_diaminopropane 3.13541 8.0142"
"1_3_diaminopropane 3.13383 9.64184"
"1_3_diaminopropane 3.12075 298.243"
"1_3_diaminopropane 3.1152 44.6212"
"1_3_diaminopropane 3.10528 337.852"
"1_3_diaminopropane 3.09617 44.1467"
"1_3_diaminopropane 3.08943 308.2"
"1_3_diaminopropane 3.0807 7.47272"
"1_3_diaminopropane 3.07912 5.6996"
"1_3_diaminopropane 2.10859 37.0658"
"1_3_diaminopropane 2.09312 79.8969"
"1_3_diaminopropane 2.08242 51.5292"
"1_3_diaminopropane 2.07727 135.629"
"1_3_diaminopropane 2.07251 47.5652"
"1_3_diaminopropane 2.06497 49.6692"
"1_3_diaminopropane 2.0618 72.4135"
"1_3_diaminopropane 2.04634 32.1115"
"1_3_diaminopropane 0.00655923 7.6155"
"1_3_diaminopropane 0.00021588 103.234"
"1_3_diaminopropane -0.00533455 7.32726"
"1_3_dimethylurea 4.80643 88.4026"
"1_3_dimethylurea 2.66393 39.238"
"1_amino_1_phenylmethyl_phosphonic_acid 7.44687 7.1684"
"1_amino_1_phenylmethyl_phosphonic_acid 4.81412 105.11"
"2_3_diphospho_D_glyceric_acid 4.5186 18.3831"
"2_3_diphospho_D_glyceric_acid 4.51622 21.4894"
"2_3_diphospho_D_glyceric_acid 4.51028 44.1328"
"2_3_diphospho_D_glyceric_acid 4.50076 35.5218"
"2_3_diphospho_D_glyceric_acid 4.49799 25.3894"
"2_3_diphospho_D_glyceric_acid 4.49164 44.5942"
"2_3_diphospho_D_glyceric_acid 4.4853 22.1747"
"2_3_diphospho_D_glyceric_acid 4.04999 17.279"
"2_3_diphospho_D_glyceric_acid 4.04325 23.3606"
"2_3_diphospho_D_glyceric_acid 4.04047 22.4084"
"2_3_diphospho_D_glyceric_acid 4.03373 24.3897"
"2_3_diphospho_D_glyceric_acid 4.02858 58.5852"
"2_3_diphospho_D_glyceric_acid 4.02144 66.5612"
"2_3_diphospho_D_glyceric_acid 4.01906 70.3493"
"2_3_diphospho_D_glyceric_acid 4.01232 60.9695"
"2_3_diphospho_D_glyceric_acid 4.00876 61.2461"
"2_3_diphospho_D_glyceric_acid 3.99964 100.939"
"2_3_diphospho_D_glyceric_acid 3.99052 57.8175"
"2_3_diphospho_D_glyceric_acid 3.97823 36.9309"
"2_3_diphospho_D_glyceric_acid 3.96911 17.9823"
"2_3_diphospho_D_glyceric_acid 3.15121 92.128"
"2_3_diphospho_D_glyceric_acid 3.14289 114.087"
"2_3_diphospho_D_glyceric_acid 3.13813 121.311"
"2_3_diphospho_D_glyceric_acid 3.12981 193.855"
"2_3_diphospho_D_glyceric_acid 3.12148 159.122"
"2_3_diphospho_D_glyceric_acid 3.108 92.3526"
"2_3_diphospho_D_glyceric_acid 3.10007 55.6859"
"2_3_diphospho_D_glyceric_acid 1.99395 300.192"
"2_3_diphospho_D_glyceric_acid 1.97849 465.916"
"2_3_diphospho_D_glyceric_acid 1.95787 30.7119"
"2_3_diphospho_D_glyceric_acid 1.95311 34.917"
"2_3_diphospho_D_glyceric_acid 1.82585 30.7865"
"2_3_diphospho_D_glyceric_acid 1.80167 361.596"
"2_3_diphospho_D_glyceric_acid 1.79572 410.836"
"2_3_diphospho_D_glyceric_acid 1.78462 294.731"
"2_3_diphospho_D_glyceric_acid 1.77788 274.143"
"2_3_diphospho_D_glyceric_acid 1.6522 179.805"
"2_3_diphospho_D_glyceric_acid 1.62604 191.929"
"2_3_diphospho_D_glyceric_acid 1.38182 93.7426"
"2_3_diphospho_D_glyceric_acid 1.36992 38.1639"
"2_3_diphospho_D_glyceric_acid 1.35724 310.132"
"2_3_diphospho_D_glyceric_acid 1.35208 290.138"
"2_3_diphospho_D_glyceric_acid 1.33741 593.161"
"2_3_diphospho_D_glyceric_acid 1.33107 662.735"
"2_3_diphospho_D_glyceric_acid 1.32671 503.942"
"2_3_diphospho_D_glyceric_acid 1.32076 616.226"
"2_3_diphospho_D_glyceric_acid 1.30371 298.131"
"2_3_diphospho_D_glyceric_acid 1.29737 211.264"
"2_3_diphospho_D_glyceric_acid 1.27834 93.7896"
"2_3_diphospho_D_glyceric_acid 1.27239 64.2389"
"2_3_diphospho_D_glyceric_acid 1.22482 6.26529"
"2_3_diphospho_D_glyceric_acid 1.21808 10.5804"
"2_3_diphospho_D_glyceric_acid 1.20936 48.6332"
"2_3_diphospho_D_glyceric_acid 1.20262 74.4761"
"2_3_diphospho_D_glyceric_acid 1.19548 62.2725"
"2_3_diphospho_D_glyceric_acid 1.18478 115.727"
"2_3_diphospho_D_glyceric_acid 1.17764 140.963"
"2_3_diphospho_D_glyceric_acid 1.1709 102.384"
"2_3_diphospho_D_glyceric_acid 1.1598 115.19"
"2_3_diphospho_D_glyceric_acid 1.15306 116.661"
"2_3_diphospho_D_glyceric_acid 1.14513 64.8014"
"2_3_diphospho_D_glyceric_acid 1.13681 45.9263"
"2_3_diphospho_D_glyceric_acid 1.12848 35.0817"
"2_3_diphospho_D_glyceric_acid 0.000156828 127.55"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.5186 18.3831"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.51622 21.4894"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.51028 44.1328"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.50076 35.5218"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.49799 25.3894"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.49164 44.5942"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.4853 22.1747"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.04999 17.279"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.04325 23.3606"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.04047 22.4084"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.03373 24.3897"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.02858 58.5852"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.02144 66.5612"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.01906 70.3493"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.01232 60.9695"
"2_amino_5_ethyl_1_3_4_thiadiazole 4.00876 61.2461"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.99964 100.939"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.99052 57.8175"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.97823 36.9309"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.96911 17.9823"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.15121 92.128"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.14289 114.087"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.13813 121.311"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.12981 193.855"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.12148 159.122"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.108 92.3526"
"2_amino_5_ethyl_1_3_4_thiadiazole 3.10007 55.6859"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.99395 300.192"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.97849 465.916"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.95787 30.7119"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.95311 34.917"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.82585 30.7865"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.80167 361.596"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.79572 410.836"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.78462 294.731"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.77788 274.143"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.6522 179.805"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.62604 191.929"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.38182 93.7426"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.36992 38.1639"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.35724 310.132"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.35208 290.138"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.33741 593.161"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.33107 662.735"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.32671 503.942"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.32076 616.226"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.30371 298.131"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.29737 211.264"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.27834 93.7896"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.27239 64.2389"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.22482 6.26529"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.21808 10.5804"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.20936 48.6332"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.20262 74.4761"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.19548 62.2725"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.18478 115.727"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.17764 140.963"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.1709 102.384"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.1598 115.19"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.15306 116.661"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.14513 64.8014"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.13681 45.9263"
"2_amino_5_ethyl_1_3_4_thiadiazole 1.12848 35.0817"
"2_amino_5_ethyl_1_3_4_thiadiazole 0.000156828 127.55"


And my second dataframe is like query which has limited rows
"new.col ppm.p. freq.p."
"unknown" 7.44687 7.1684
"unknown" 4.81412 105.11
I want to compare the second and third columns of both dataframe and see
whether there are any identical values in them.
My expected answer is that the second dataframe is similar to  values of
1_amino_1_phenylmethyl_phosphonic_acidpeak  in data frame 1.

--
View this message in context: http://r.789695.n4.nabble.com/finding-duplicates-in-a-data-frame-tp4633231.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list