[R] merge(join) problem

Sam Steingold sds at gnu.org
Wed Aug 17 00:00:51 CEST 2011


I have two datasets:
A with columns Open and Name (and many others, irrelevant to the merge)
B with columns Time and Name (and many others, irrelevant to the merge)

I want the dataset AB with all these columns
Open from A - a difftime (time of day)
Time from B - a difftime (time of day)
Name (same in A & B) - a factor, does NOT index rows, i.e., there are
_many_ rows in both A & B with the same Name.
all the other columns from A & B.

Each row in AB must come from exactly one row in A.
(i.e., dim(AB)[1] == dim(A)[1]).

For each row in AB, Open>=Time, and "as small as possible".

The above conditions uniquely define AB.

The "obvious algorithm" is: for each row in A search B for a row
with the same Name and the largest Time <= Open.

However, I don't see an easy way to do it in R.
The obvious intermediary step is

AB1 <- merge(A, B, all.x = TRUE, all.y = FALSE, by = 'Name')

Now, AB1 has many rows with the same Name and Open.
I need to drop all of them except for the one with the largest Time <= Open.
I can do

AB2 <- AB1[which(AB1$Time <= AB1$Open),]

Now I need to keep just _one_ row with the same Name & Open - and the
largest Time.

How do I do that?

unique() seems to have the right name, but I don't see how it can help me...

tia.

-- 
Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031
http://jihadwatch.org http://honestreporting.com
http://ffii.org http://camera.org http://thereligionofpeace.com
UNIX is a way of thinking.  Windows is a way of not thinking.



More information about the R-help mailing list