[R] fuzzy merge

ravi rv15i at yahoo.se
Wed Apr 9 10:53:00 CEST 2008


Hi,
I would like to merge two data frames. It is just that I want the merging to be done with some kind of a fuzzy criterion. Let me explain.
My first data frame looks like this :

ID1                     time1                                dt            
1                        2008-01-02 13:11                10
2                        2008-01-02 14:20                20
3                        2008-01-02 15:42                30
4                        2008-01-02 16:45                40
5                        2008-01-02 17:42                50
6                        2008-01-02 20:40                60


My second data frame :

ID2                        time2                                d1
101                        2008-01-02 14:29                75
102                        2008-01-02 17:55                105
103                        2008-02-07 20:01                8



I want the merging to be done such that time2 is in the range between time1 and (time1+15 min). 
That is, my merged data frame should be :

ID1                     time1                                    time2                                                                  
2                        2008-01-02 14:20                2008-01-02 14:29                                     
5                        2008-01-02 17:42                2008-01-02 17:55


My data frames have thousands of records. If the two data frames are d1 and d2,

d3<-merge(d1,d2,by.x=time1,by.y=time2)
will work only for exact matching. One possible option is to match the times for the date and hour times only (by filtering away the minute data). 
But this is only a partial solution as I am not interested in data where the time difference is more than 15 minutes.

How can I make the merge to work for fuzzy matching?
Would it be easier to convert the times into data classes? Or, it better to treat them as strings and use regular expresssions for doing the matching?

I would appreciate any help that I can get.
Thanking You,
Ravi




More information about the R-help mailing list