[R] Fuzzy merge using timestamps

Sarah Goslee sarah.goslee at gmail.com
Wed Nov 10 19:12:53 CET 2010


On Wed, Nov 10, 2010 at 12:57 PM, Ian Craig <ian.jhsph at gmail.com> wrote:
> Greetings Supreme Council of R Masters,

Nice. :)

> I have two sets of data, each with a set of timestamps.  I would like to
> somehow merge the datasets based on the timestamps and an individual
> identifier.  That is there are several individuals all with timestamps, with
> times that could overlap.  By browsing through some of the older posts, I
> got the idea to create a third data frame of both sets of timestamps,
> individual identifiers, and a key to determine which dataset they have come
> from, then find the breaks to determine which of each dataset should be
> paired.  the code I have written so far look something like this.

This would be easier to sort through if you included a toy example with
data so that we could try it. As it is, I have no idea what your data
actually look like.

> gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps)
> urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur)
>
> gpsdata$ID1 <- row.names(gpsdata)
> urdata$ID2 <- row.names(urdata)
>
> gpsdata$key1 <- rep(0, nrow(gpsdata))
> urdata$key2 <- rep(1, nrow(urdata))
>
> checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2),
>        ARC=c(gpsdata$gpsARC, urdata$urARC),
>        times=c(gpsdata$t_datetimegps, urdata$t_datetimeur),
>        key=c(gpsdata$key1, urdata$key2))
>
> checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing =
> FALSE),]
>
> breaks <- which(diff(checkTime$key) == 1)
>
> match <- data.frame(ID1=checkTime$ID[breaks],
>        gpsARC = checkTime$ARC[breaks],
>        urARC = checkTime$ARC[breaks + 1],
>        t_datetimegps=checkTime$times[breaks],
>        t_datetimeur=checkTime$times[breaks + 1])
>
> #Then I merge the 'match' data frame with the gpsdata data frame and the
> product with the urdata data frame.  The problem is that when I create the
> checkTime data frame and sort it, it sorts the urdata portion first then the
> gpsdata portion.   So my key column looks like
> 1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of
> 0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key.
>  S.O.S!!!!  Why is it doing this?  Shouldn't it just order the timestamps of
> both data frames together?

So really this is a sorting problem, not a merging problem? Is the merging
part working correctly?

What exactly are you doing to merge? To sort?

Here again a worked functional example would be really useful. Without
knowing what you're doing, I can't offer suggestions.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list