[R] Merging two data frames with 3 common variables makes duplicated rows
Thomas Lumley
tlumley at u.washington.edu
Sat May 9 00:06:48 CEST 2009
On Fri, 8 May 2009, Rock Ouimet wrote:
> I am new to R (ex SAS user) , and I cannot merge two data frames without
> getting duplicated rows in the results. How to avoid this happening without
> using the unique() function?
>
> 1. First data frame is called "tmv" with 6 variables and 239 rows:
>
>> tmv[1:10,]
> temps nom prenom sexe dist style
> 1 01:59:36 Cyr Steve H 45 free
> 2 02:09:55 Gosselin Erick H 45 free
> 3 02:12:18 Desfosses Sacha H 45 free
> 4 02:12:23 Lapointe Sebastien H 45 free
> 5 02:12:52 Labrie Michel H 45 free
> 6 02:12:54 Leblanc Michel H 45 free
> 7 02:13:02 Thibeault Sylvain H 45 free
> 8 02:13:49 Martel Stephane H 45 free
> 9 02:14:03 Lavoie Jean-Philippe H 45 free
> 10 02:14:05 Boivin Jean-Claude H 45 free
>
> Its structure is:
>> str(tmv)
> 'data.frame': 239 obs. of 6 variables:
> $ temps :Class 'times' atomic [1:239] 0.0831 0.0902 0.0919 0.0919 0.0923
> ...
> .. ..- attr(*, "format")= chr "h:m:s"
> $ nom : Factor w/ 167 levels "Aubut","Audy",..: 45 84 55 105 98 110 158
> 117 109 22 ...
> $ prenom: Factor w/ 135 levels "Alain","Alexandre",..: 128 33 121 122 93 93
> 130 126 63 59 ...
> $ sexe : Factor w/ 2 levels "F","H": 2 2 2 2 2 2 2 2 2 2 ...
> $ dist : int 45 45 45 45 45 45 45 45 45 45 ...
> $ style : Factor w/ 2 levels "clas","free": 2 2 2 2 2 2 2 2 2 2 ...
>
>
> 2. The second data frame is called "meil2" with 4 variables and 16 rows;
>> meil2[1:10,]
> dist sexe style meil
> 1 38 F clas 02:43:17
> 2 38 F free 02:24:46
> 3 38 H clas 02:37:36
> 4 38 H free 01:59:35
> 5 45 F clas 03:46:15
> 6 45 F free 02:20:15
> 7 45 H clas 02:30:07
> 8 45 H free 01:59:36
> 9 38 F clas 02:43:17
> 10 38 F free 02:24:46
Lines 9 and 1 appear to be the same in meil2, as do 2 and 10. If the 16 rows consist of two repeats of 8 rows that would explain why you are getting two copies of each individual in the output. unique(meil2) would have just the distinct rows.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list