[R] Merging two data frames with 3 common variables makes duplicated rows

Thomas Lumley tlumley at u.washington.edu
Sat May 9 00:06:48 CEST 2009


On Fri, 8 May 2009, Rock Ouimet wrote:

> I am new to R (ex SAS user) , and I cannot merge two data frames without
> getting duplicated rows in the results. How to avoid this happening without
> using the unique() function?
>
> 1. First data frame is called "tmv" with 6 variables and 239 rows:
>
>> tmv[1:10,]
>      temps       nom        prenom sexe dist style
> 1  01:59:36       Cyr         Steve    H   45  free
> 2  02:09:55  Gosselin         Erick    H   45  free
> 3  02:12:18 Desfosses         Sacha    H   45  free
> 4  02:12:23  Lapointe     Sebastien    H   45  free
> 5  02:12:52    Labrie        Michel    H   45  free
> 6  02:12:54   Leblanc        Michel    H   45  free
> 7  02:13:02 Thibeault       Sylvain    H   45  free
> 8  02:13:49    Martel      Stephane    H   45  free
> 9  02:14:03    Lavoie Jean-Philippe    H   45  free
> 10 02:14:05    Boivin   Jean-Claude    H   45  free
>
> Its structure is:
>> str(tmv)
> 'data.frame':   239 obs. of  6 variables:
> $ temps :Class 'times'  atomic [1:239] 0.0831 0.0902 0.0919 0.0919 0.0923
> ...
>  .. ..- attr(*, "format")= chr "h:m:s"
> $ nom   : Factor w/ 167 levels "Aubut","Audy",..: 45 84 55 105 98 110 158
> 117 109 22 ...
> $ prenom: Factor w/ 135 levels "Alain","Alexandre",..: 128 33 121 122 93 93
> 130 126 63 59 ...
> $ sexe  : Factor w/ 2 levels "F","H": 2 2 2 2 2 2 2 2 2 2 ...
> $ dist  : int  45 45 45 45 45 45 45 45 45 45 ...
> $ style : Factor w/ 2 levels "clas","free": 2 2 2 2 2 2 2 2 2 2 ...
>
>
> 2. The second data frame is called "meil2" with 4 variables and 16 rows;
>> meil2[1:10,]
>   dist sexe style     meil
> 1    38    F  clas 02:43:17
> 2    38    F  free 02:24:46
> 3    38    H  clas 02:37:36
> 4    38    H  free 01:59:35
> 5    45    F  clas 03:46:15
> 6    45    F  free 02:20:15
> 7    45    H  clas 02:30:07
> 8    45    H  free 01:59:36
> 9    38    F  clas 02:43:17
> 10   38    F  free 02:24:46


Lines 9 and 1 appear to be the same in meil2, as do 2 and 10.  If the 16 rows consist of two repeats of 8 rows that would explain why you are getting two copies of each individual in the output. unique(meil2) would have just the distinct rows.

      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list