[R] Finding rows common to two datasets

Tony Plate tplate at acm.org
Tue Apr 28 23:05:41 CEST 2009


I think merge() can do what's wanted, but you do have to be careful that values match exactly.  Here's an example where two data frames print the same in a row for columns 'a' and 'b', but are not exactly same.  merge() returns zero rows.  This problem can be fixed in this case by rounding, but that's not a good general solution because very close numbers can round to different numbers, e.g., 1.499 and 1.501.

Here are examples:

> x <- data.frame(a=c(1.0000001,2), b=c(3,4), c=LETTERS[1:2])
> y <- data.frame(a=c(1,2), b=c(3,5), c=LETTERS[3:4])
> x
  a b c
1 1 3 A
2 2 4 B
> y
  a b c
1 1 3 C
2 2 5 D
> # x[1,"a"] and y[1,"a"] look the same, but are very slightly different
> merge(x, y, by=c("a", "b"))
[1] a   b   c.x c.y
<0 rows> (or 0-length row.names)
> # make x1 a version of x where the values are rounded to whole numbers
> x1 <- x
> x1$a <- round(x1$a)
> merge(x1, y, by=c("a", "b"))
  a b c.x c.y
1 1 3   A   C
> 
> # intersect() returns columns that are the same in each dataframe, not rows
> intersect(x, y)
  c
1 C
2 D
> intersect(x1, y)
  a c
1 1 C
2 2 D
> 

-- Tony Plate

jim holtman wrote:
> You are missing a comma:
> 
> common <- intersect(data_frame_x[,c("Latitude", "Longitude")],
> data_frame_y[,c("Latitude","Longitude")])
> 
> On Tue, Apr 28, 2009 at 5:49 AM, Steve Murray <smurray444 at hotmail.com> wrote:
>> Thanks for the reply, however, when I do the following command, I receive the message: 'data frame with 0 columns and 0 rows'. I've checked again though, and there should be several thousand rows where the Latitude and Longitude pairs are the same.
>>
>>> common <- intersect(data_frame_x[c("Latitude", "Longitude")], data_frame_y[c("Latitude","Longitude")])
>>> common
>> data frame with 0 columns and 0 rows
>>
>>
>> Is there an obvious solution to this? Should I be using 'unique' instead, and if so, how would I get the above to correspond to this command?
>>
>> Thanks,
>>
>> Steve
>>
>>
>>
>>
>> ________________________________
>>> Date: Tue, 28 Apr 2009 13:36:51 +0530
>>> Subject: Re: [R] Finding rows common to two datasets
>>> From: umesh.srinivasan at gmail.com
>>> To: smurray444 at hotmail.com
>>> CC: r-help at r-project.org
>>>
>>> Dear Steve,
>>>
>>> Try
>>>
>>> ? intersect
>>>
>>> and see if that might help.
>>>
>>> Cheers,
>>> Umesh
>>>
>>> On Tue, Apr 28, 2009 at 1:29 PM, Steve Murray> wrote:
>>>
>>>
>>>
>>> Dear all,
>>>
>>>
>>>
>>> I have 2 data frames, both with 14 columns of data and differing numbers of rows. The first two columns are 'Latitude' and 'Longitude'. I want to find the pairs of Latitude and Longitude coordinates which are common to both datasets, and output a new data frame which is composed of these coincident rows. I tried using the 'unique' command, but had difficulties interpreting the help file.
>>>
>>>
>>>
>>>
>>> Many thanks for any help offered,
>>>
>>>
>>>
>>> Steve
>>>
>>>
>>>
>>> ______________________________________________
>>>
>>> R-help at r-project.org mailing list
>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
>




More information about the R-help mailing list