[R] Fuzzy merge using timestamps
Sarah Goslee
sarah.goslee at gmail.com
Wed Nov 10 20:09:16 CET 2010
Hi Ian,
Did you see the second part of my request, for a simple
reproducible example? At the very least, we need your
merge and sort code, since it sounds like that's where
the problem truly lies.
Sarah
On Wed, Nov 10, 2010 at 2:02 PM, Ian Craig <ian.jhsph at gmail.com> wrote:
> So here is a sample of the datasets. I have modified them for the purposes
> of circulating to solve this problem. I have already added the ID and key
> as seen in the code below. Any ideas?
> gpsdata
> gpsARC Protocol Track UTCDate UTCTime LocalDate LocalTime y
> x Altitude_m Speed_km_h TimeBef_sec Dist_Bef_m TimeAft_sec DistAft_m
> t_datetimegps ID1 key1
> 1 118259 38-C 1 07/21/2009 21:29:05 07/21/2009 17:29:05 39.29393
> -76.54961 0.547843 0.162998 0 0.00 36 27.25
> 2009-07-21 17:09:00 1 0
> 2 118259 38-C 1 07/21/2009 21:29:41 07/21/2009 17:29:41 39.29397
> -76.54930 6.615943 27.425990 36 27.25 4 32.53
> 2009-07-21 17:19:00 2 0
> 3 118259 38-C 1 07/21/2009 21:29:45 07/21/2009 17:29:45 39.29383
> -76.54897 6.357120 30.192910 4 32.53 3 26.39
> 2009-07-21 17:29:00 3 0
> 4 118259 38-C 1 07/21/2009 21:29:48 07/21/2009 17:29:48 39.29363
> -76.54881 6.134487 30.489670 3 26.39 5 27.27
> 2009-07-21 17:39:00 4 0
> 5 118259 38-C 1 07/21/2009 21:29:53 07/21/2009 17:29:53 39.29341
> -76.54868 6.451584 13.394810 5 27.27 10 27.47
> 2009-07-21 17:49:00 5 0
> 6 118259 38-C 1 07/21/2009 21:30:03 07/21/2009 17:30:03 39.29352
> -76.54839 4.324665 13.323760 10 27.47 5 26.25
> 2009-07-22 18:00:00 6 0
> 7 118259 38-C 1 07/21/2009 21:30:08 07/21/2009 17:30:08 39.29370
> -76.54819 3.662347 12.518040 5 26.25 8 30.68
> 2009-07-22 18:20:00 7 0
> 8 118259 38-C 1 07/21/2009 21:30:16 07/21/2009 17:30:16 39.29394
> -76.54837 2.378734 36.134190 8 30.68 3 31.95
> 2009-07-22 18:30:00 8 0
> 9 118259 38-C 1 07/21/2009 21:30:19 07/21/2009 17:30:19 39.29408
> -76.54870 2.222491 38.962510 3 31.95 3 31.20
> 2009-07-22 18:40:00 9 0
> 10 118259 38-C 1 07/21/2009 21:30:22 07/21/2009 17:30:22 39.29419
> -76.54903 2.024886 31.771920 3 31.20 6 25.91
> 2009-07-22 18:50:00 10 0
> 11 118259 38-C 1 07/21/2009 21:30:28 07/21/2009 17:30:28 39.29430
> -76.54929 2.831168 4.424288 6 25.91 51 25.10
> 2009-07-22 19:00:00 11 0
> 12 118259 38-C 1 07/21/2009 21:31:19 07/21/2009 17:31:19 39.29432
> -76.54958 1.545548 26.467430 51 25.10 3 26.97
> 2009-07-22 19:01:00 12 0
> 13 118259 38-C 1 07/21/2009 21:31:22 07/21/2009 17:31:22 39.29415
> -76.54980 2.062366 37.769800 3 26.97 3 34.60
> 2009-07-22 19:05:00 13 0
> 14 217276 38-C 1 07/21/2009 21:31:25 07/21/2009 17:31:25 39.29388
> -76.55000 2.120106 45.099640 3 34.60 2 25.38
> 2009-07-21 17:11:00 14 0
> 15 217276 38-C 1 07/21/2009 21:31:27 07/21/2009 17:31:27 39.29367
> -76.55014 2.107622 47.953690 2 25.38 2 26.39
> 2009-07-21 17:21:00 15 0
> 16 217276 38-C 1 07/21/2009 21:31:29 07/21/2009 17:31:29 39.29347
> -76.55029 1.942083 47.656870 2 26.39 2 26.72
> 2009-07-21 17:31:00 16 0
> 17 217276 38-C 1 07/21/2009 21:31:31 07/21/2009 17:31:31 39.29326
> -76.55045 2.023492 46.045600 2 26.72 3 36.03
> 2009-07-21 17:41:00 17 0
> 18 217276 38-C 1 07/21/2009 21:31:34 07/21/2009 17:31:34 39.29298
> -76.55066 2.211649 39.604580 3 36.03 3 27.07
> 2009-07-21 17:51:00 18 0
> 19 217276 38-C 1 07/21/2009 21:31:37 07/21/2009 17:31:37 39.29277
> -76.55083 1.479468 25.079980 3 27.07 40 29.64
> 2009-07-22 19:31:00 19 0
> 20 217276 38-C 1 07/21/2009 21:32:17 07/21/2009 17:32:17 39.29256
> -76.55103 2.041623 23.242470 40 29.64 4 26.38
> 2009-07-22 19:32:00 20 0
> 21 217276 38-C 1 07/21/2009 21:32:21 07/21/2009 17:32:21 39.29255
> -76.55134 2.122359 32.137710 4 26.38 3 34.66
> 2009-07-22 19:34:00 21 0
> 22 217276 38-C 1 07/21/2009 21:32:24 07/21/2009 17:32:24 39.29276
> -76.55164 0.776128 48.219300 3 34.66 2 28.80
> 2009-07-22 19:36:00 22 0
> 23 217276 38-C 1 07/21/2009 21:32:26 07/21/2009 17:32:26 39.29294
> -76.55187 -0.542242 53.586880 2 28.80 2 31.85
> 2009-07-22 19:38:00 23 0
> 24 217276 38-C 1 07/21/2009 21:32:28 07/21/2009 17:32:28 39.29316
> -76.55210 -0.383976 56.380370 2 31.85 2 32.48
> 2009-07-22 19:40:00 24 0
> 25 217276 38-C 1 07/21/2009 21:32:30 07/21/2009 17:32:30 39.29344
> -76.55223 -1.343364 58.055140 2 32.48 2 33.62
> 2009-07-22 19:41:00 25 0
> 26 217276 38-C 1 07/21/2009 21:32:32 07/21/2009 17:32:32 39.29374
> -76.55227 -2.229877 61.005420 2 33.62 2 35.23
> 2009-07-22 19:45:00 26 0
> 27 217276 38-C 1 07/21/2009 21:32:34 07/21/2009 17:32:34 39.29405
> -76.55220 -2.645037 65.670400 2 35.23 2 38.70
> 2009-07-23 11:02:00 27 0
> 28 217276 38-C 1 07/21/2009 21:32:36 07/21/2009 17:32:36 39.29438
> -76.55206 -3.099854 70.528790 2 38.70 2 40.63
> 2009-07-23 11:12:00 28 0
> 29 217276 38-C 1 07/21/2009 21:32:38 07/21/2009 17:32:38 39.29472
> -76.55186 -3.309293 73.246460 2 40.63 2 41.87
> 2009-07-23 11:22:00 29 0
> 30 217276 38-C 1 07/21/2009 21:32:40 07/21/2009 17:32:40 39.29505
> -76.55164 -2.752104 75.466320 2 41.87 2 42.81
> 2009-07-23 11:32:00 30 0
>
> urdata
> Study urARC Begin_DateTime t_datetime Begin_Date COC_POS
> COD_POS TOT_POS t_datetimeur ID2 key2
> 1 38-C 118259 11/3/2008 22:05 11/5/2008 10:05 11/3/2008 22:05 0
> 1 1 2008-11-05 10:05:00 1 1
> 2 38-C 118259 11/19/2008 23:33 11/21/2008 11:33 11/19/2008 23:33 0
> 1 1 2009-07-21 17:21:00 2 1
> 3 38-C 118259 11/23/2008 0:37 11/24/2008 12:37 11/23/2008 0:37 0
> 1 1 2008-11-24 12:37:00 3 1
> 4 38-C 118259 11/25/2008 5:18 11/26/2008 17:18 11/25/2008 5:18 0
> 1 1 31.95 2009-07-22 18:28:0 4 1
> 5 38-C 217276 11/26/2008 23:20 11/28/2008 11:20 11/26/2008 23:20 0
> 1 1 2008-11-28 11:20:00 5 1
> 6 38-C 217276 11/30/2008 0:35 12/1/2008 12:35 11/30/2008 0:35 0
> 1 1 2009-07-21 17:26:00 6 1
> 7 38-C 217276 12/2/2008 5:05 12/3/2008 17:05 12/2/2008 5:05 0
> 1 1 2008-12-03 17:05:00 7 1
> 8 38-C 217276 12/4/2008 5:47 12/5/2008 17:47 12/4/2008 5:47 0
> 1 1 2009-07-22 19:39:00 8 1
> 9 38-C 118259 12/7/2008 4:50 12/8/2008 16:50 12/7/2008 4:50 0
> 0 0 2008-12-08 16:50:00 9 1
> 10 38-C 118259 12/9/2008 4:29 12/10/2008 16:29 12/9/2008 4:29 0
> 0 0 2009-07-22 17:58:0 10 1
> 11 38-C 118259 12/11/2008 5:46 12/12/2008 17:46 12/11/2008 5:46 0
> 0 0 2008-12-12 17:46:00 11 1
> 12 38-C 217276 12/14/2008 0:21 12/15/2008 12:21 12/14/2008 0:21 0
> 0 0 2008-12-15 12:21:00 12 1
> 13 38-C 217276 12/16/2008 5:40 12/17/2008 17:40 12/16/2008 5:40 0
> 0 0 2008-12-17 17:40:00 13 1
> 14 38-C 217276 12/18/2008 5:47 12/19/2008 17:47 12/18/2008 5:47 0
> 0 0 2009-07-23 11:15:00 14 1
> 15 38-C 217276 12/21/2008 4:31 12/22/2008 16:31 12/21/2008 4:31 0
> 0 0 2008-12-22 16:31:00 15 1
>
> On Wed, Nov 10, 2010 at 1:12 PM, Sarah Goslee <sarah.goslee at gmail.com>
> wrote:
>>
>> On Wed, Nov 10, 2010 at 12:57 PM, Ian Craig <ian.jhsph at gmail.com> wrote:
>> > Greetings Supreme Council of R Masters,
>>
>> Nice. :)
>>
>> > I have two sets of data, each with a set of timestamps. I would like to
>> > somehow merge the datasets based on the timestamps and an individual
>> > identifier. That is there are several individuals all with timestamps,
>> > with
>> > times that could overlap. By browsing through some of the older posts,
>> > I
>> > got the idea to create a third data frame of both sets of timestamps,
>> > individual identifiers, and a key to determine which dataset they have
>> > come
>> > from, then find the breaks to determine which of each dataset should be
>> > paired. the code I have written so far look something like this.
>>
>> This would be easier to sort through if you included a toy example with
>> data so that we could try it. As it is, I have no idea what your data
>> actually look like.
>>
>> > gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps)
>> > urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur)
>> >
>> > gpsdata$ID1 <- row.names(gpsdata)
>> > urdata$ID2 <- row.names(urdata)
>> >
>> > gpsdata$key1 <- rep(0, nrow(gpsdata))
>> > urdata$key2 <- rep(1, nrow(urdata))
>> >
>> > checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2),
>> > ARC=c(gpsdata$gpsARC, urdata$urARC),
>> > times=c(gpsdata$t_datetimegps, urdata$t_datetimeur),
>> > key=c(gpsdata$key1, urdata$key2))
>> >
>> > checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times,
>> > decreasing =
>> > FALSE),]
>> >
>> > breaks <- which(diff(checkTime$key) == 1)
>> >
>> > match <- data.frame(ID1=checkTime$ID[breaks],
>> > gpsARC = checkTime$ARC[breaks],
>> > urARC = checkTime$ARC[breaks + 1],
>> > t_datetimegps=checkTime$times[breaks],
>> > t_datetimeur=checkTime$times[breaks + 1])
>> >
>> > #Then I merge the 'match' data frame with the gpsdata data frame and the
>> > product with the urdata data frame. The problem is that when I create
>> > the
>> > checkTime data frame and sort it, it sorts the urdata portion first then
>> > the
>> > gpsdata portion. So my key column looks like
>> > 1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of
>> > 0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key.
>> > S.O.S!!!! Why is it doing this? Shouldn't it just order the
>> > timestamps of
>> > both data frames together?
>>
>> So really this is a sorting problem, not a merging problem? Is the merging
>> part working correctly?
>>
>> What exactly are you doing to merge? To sort?
>>
>> Here again a worked functional example would be really useful. Without
>> knowing what you're doing, I can't offer suggestions.
>>
>> Sarah
>>
>> --
>
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list