[R] Conditional operation on multiple columns from two data frames
jim holtman
jholtman at gmail.com
Mon Dec 29 03:43:19 CET 2008
Merge will give you an output like below and you should be able to
find the value you need:
> merge(df1, df2, by='ID', all=TRUE)
ID Date.x y.x x.x Date.y
y.y x.y
1 1 <NA> NA NA 2007-05-31 23:00:00
-20.82907 217.7022
2 1 <NA> NA NA 2007-06-01 11:00:00
-20.82500 217.6980
3 1 <NA> NA NA 2007-06-01 23:00:00
-20.82236 217.7012
4 2 2007-05-30 20:21:56 NA NA 2007-02-19 00:00:00
-37.40960 178.3060
5 2 2007-05-30 20:21:56 NA NA 2007-02-19 12:00:00
-37.37515 178.3360
6 2 2007-05-30 20:21:56 NA NA 2007-02-20 00:00:00
-37.27700 178.3780
7 2 2007-05-30 20:21:56 NA NA 2007-02-20 12:00:00
-37.17855 178.3845
8 2 2007-05-30 20:21:56 NA NA 2007-02-21 00:00:00
-37.07595 178.3890
9 2 2007-05-30 20:21:56 NA NA 2007-02-21 12:00:00
-36.91965 178.3965
10 2 2007-05-30 20:21:56 NA NA 2007-02-22 00:00:00
-36.74345 178.3910
11 2 2007-05-30 20:21:56 NA NA 2007-02-22 12:00:00
-36.49245 178.5090
12 2 2007-05-30 20:21:56 NA NA 2007-02-23 00:00:00
-36.24150 178.5750
13 2 2007-05-30 20:21:56 NA NA 2007-02-23 12:00:00
-35.95500 178.6280
14 2 2007-05-30 18:48:20 NA NA 2007-02-19 00:00:00
-37.40960 178.3060
15 2 2007-05-30 18:48:20 NA NA 2007-02-19 12:00:00
-37.37515 178.3360
16 2 2007-05-30 18:48:20 NA NA 2007-02-20 00:00:00
-37.27700 178.3780
17 2 2007-05-30 18:48:20 NA NA 2007-02-20 12:00:00
-37.17855 178.3845
18 2 2007-05-30 18:48:20 NA NA 2007-02-21 00:00:00
-37.07595 178.3890
19 2 2007-05-30 18:48:20 NA NA 2007-02-21 12:00:00
-36.91965 178.3965
20 2 2007-05-30 18:48:20 NA NA 2007-02-22 00:00:00
-36.74345 178.3910
21 2 2007-05-30 18:48:20 NA NA 2007-02-22 12:00:00
-36.49245 178.5090
22 2 2007-05-30 18:48:20 NA NA 2007-02-23 00:00:00
-36.24150 178.5750
23 2 2007-05-30 18:48:20 NA NA 2007-02-23 12:00:00
-35.95500 178.6280
24 2 2007-06-05 08:34:37 NA NA 2007-02-19 00:00:00
-37.40960 178.3060
25 2 2007-06-05 08:34:37 NA NA 2007-02-19 12:00:00
-37.37515 178.3360
26 2 2007-06-05 08:34:37 NA NA 2007-02-20 00:00:00
-37.27700 178.3780
27 2 2007-06-05 08:34:37 NA NA 2007-02-20 12:00:00
-37.17855 178.3845
28 2 2007-06-05 08:34:37 NA NA 2007-02-21 00:00:00
-37.07595 178.3890
29 2 2007-06-05 08:34:37 NA NA 2007-02-21 12:00:00
-36.91965 178.3965
30 2 2007-06-05 08:34:37 NA NA 2007-02-22 00:00:00
-36.74345 178.3910
31 2 2007-06-05 08:34:37 NA NA 2007-02-22 12:00:00
-36.49245 178.5090
32 2 2007-06-05 08:34:37 NA NA 2007-02-23 00:00:00
-36.24150 178.5750
33 2 2007-06-05 08:34:37 NA NA 2007-02-23 12:00:00
-35.95500 178.6280
34 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-19 00:00:00
-37.40960 178.3060
35 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-19 12:00:00
-37.37515 178.3360
36 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-20 00:00:00
-37.27700 178.3780
37 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-20 12:00:00
-37.17855 178.3845
38 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-21 00:00:00
-37.07595 178.3890
39 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-21 12:00:00
-36.91965 178.3965
40 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-22 00:00:00
-36.74345 178.3910
41 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-22 12:00:00
-36.49245 178.5090
42 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-23 00:00:00
-36.24150 178.5750
43 2 2007-06-02 09:39:58 -20.10100 -134.4210 2007-02-23 12:00:00
-35.95500 178.6280
44 2 2007-06-02 17:36:20 NA NA 2007-02-19 00:00:00
-37.40960 178.3060
45 2 2007-06-02 17:36:20 NA NA 2007-02-19 12:00:00
-37.37515 178.3360
46 2 2007-06-02 17:36:20 NA NA 2007-02-20 00:00:00
-37.27700 178.3780
47 2 2007-06-02 17:36:20 NA NA 2007-02-20 12:00:00
-37.17855 178.3845
48 2 2007-06-02 17:36:20 NA NA 2007-02-21 00:00:00
-37.07595 178.3890
49 2 2007-06-02 17:36:20 NA NA 2007-02-21 12:00:00
-36.91965 178.3965
50 2 2007-06-02 17:36:20 NA NA 2007-02-22 00:00:00
-36.74345 178.3910
51 2 2007-06-02 17:36:20 NA NA 2007-02-22 12:00:00
-36.49245 178.5090
52 2 2007-06-02 17:36:20 NA NA 2007-02-23 00:00:00
-36.24150 178.5750
53 2 2007-06-02 17:36:20 NA NA 2007-02-23 12:00:00
-35.95500 178.6280
54 2 2007-06-03 20:31:35 NA NA 2007-02-19 00:00:00
-37.40960 178.3060
55 2 2007-06-03 20:31:35 NA NA 2007-02-19 12:00:00
-37.37515 178.3360
56 2 2007-06-03 20:31:35 NA NA 2007-02-20 00:00:00
-37.27700 178.3780
57 2 2007-06-03 20:31:35 NA NA 2007-02-20 12:00:00
-37.17855 178.3845
58 2 2007-06-03 20:31:35 NA NA 2007-02-21 00:00:00
-37.07595 178.3890
59 2 2007-06-03 20:31:35 NA NA 2007-02-21 12:00:00
-36.91965 178.3965
60 2 2007-06-03 20:31:35 NA NA 2007-02-22 00:00:00
-36.74345 178.3910
61 2 2007-06-03 20:31:35 NA NA 2007-02-22 12:00:00
-36.49245 178.5090
62 2 2007-06-03 20:31:35 NA NA 2007-02-23 00:00:00
-36.24150 178.5750
63 2 2007-06-03 20:31:35 NA NA 2007-02-23 12:00:00
-35.95500 178.6280
64 3 2007-03-23 06:33:26 NA NA <NA>
NA NA
65 3 2007-03-23 17:02:21 NA NA <NA>
NA NA
66 3 2007-02-19 00:00:00 -37.40822 178.3440 <NA>
NA NA
67 3 2007-03-10 19:58:18 NA NA <NA>
NA NA
68 3 2007-03-25 06:41:03 -31.18100 178.7700 <NA>
NA NA
69 3 2007-03-24 04:27:44 -31.16400 178.7650 <NA>
NA NA
70 3 2007-03-24 11:35:47 NA NA <NA>
NA NA
On Sat, Dec 27, 2008 at 11:55 PM, Tim Sippel <tsippel at gmail.com> wrote:
> Sorry...I was unaware I needed to 'deput' to deparse data examples into a
> useful format. The data examples I provided initially are deparsed below:
>
> df1<-
> list(ID = c(2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3), Date = c("2007-05-30
> 18:48:20",
> "2007-05-30 20:21:56", "2007-06-02 09:39:58", "2007-06-02 17:36:20",
> "2007-06-03 20:31:35", "2007-06-05 08:34:37", "2007-02-19 00:00:00",
> "2007-03-10 19:58:18", "2007-03-23 06:33:26", "2007-03-23 17:02:21",
> "2007-03-24 04:27:44", "2007-03-24 11:35:47", "2007-03-25 06:41:03"
> ), y = c(NA, NA, -20.101, NA, NA, NA, -37.4082166666667, NA,
> NA, NA, -31.164, NA, -31.181), x = c(NA, NA, -134.421, NA, NA,
> NA, 178.34395, NA, NA, NA, 178.765, NA, 178.77))
>
> df2<-
> list(ID = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Date = c("2007-05-31
> 23:00:00",
> "2007-06-01 11:00:00", "2007-06-01 23:00:00", "2007-02-19 00:00:00",
> "2007-02-19 12:00:00", "2007-02-20 00:00:00", "2007-02-20 12:00:00",
> "2007-02-21 00:00:00", "2007-02-21 12:00:00", "2007-02-22 00:00:00",
> "2007-02-22 12:00:00", "2007-02-23 00:00:00", "2007-02-23 12:00:00"
> ), y = c(-20.829075, -20.824995, -20.822355, -37.4096, -37.37515,
> -37.277, -37.17855, -37.07595, -36.91965, -36.74345, -36.49245,
> -36.2415, -35.955), x = c(217.7022, 217.698, 217.70115, 178.306,
> 178.336, 178.378, 178.3845, 178.389, 178.3965, 178.391, 178.509,
> 178.575, 178.628))
>
> On Sun, Dec 28, 2008 at 4:36 PM, Tim Sippel <tsippel at gmail.com> wrote:
>
>> Hi-
>> I have two data frames for which I wish to conditionally subtract the
>> values of one dataframe from the other.
>>
>> I want to subtract df1$x from df2$x when their id is equal and the
>> absolute value of difference between the dates is 12 hours or less. If
>> there is no match of equal id's and dates less than 12 hours apart I want it
>> to return "NA". Note that df1 has missing values in x and y columns, but
>> df2 does not have any missing values. I think I'm close with the code at
>> the end, but would appreciate a bit of help getting over this hurdle.
>>
>> Sample dataframes follow:
>>
>> *df1<-*
>> ID Date y x STM07.1 30-05-2007 18:48 STM07.1 30-05-2007 20:21
>> STM07.1 02-06-2007 09:39 -20.101 -134.421 STM07.1 02-06-2007 17:36
>> STM07.1 03-06-2007 20:31 STM07.1 05-06-2007 08:34 STM07.2 19-02-2007
>> 00:00 -37.4082 178.34395 STM07.2 10-03-2007 19:58 STM07.2 23-03-2007
>> 06:33 STM07.2 23-03-2007 17:02 STM07.2 24-03-2007 04:27 -31.164 178.765
>> STM07.2 24-03-2007 11:35 STM07.2 25-03-2007 06:41 -31.181 178.77 STM07.2 26-03-2007
>> 05:11
>> **
>> *df2<-*
>> ID Date y x STM07.1 31/05/2007 23:00 -20.8291 217.7022 STM07.1 1/06/2007
>> 11:00 -20.825 217.698 STM07.1 1/06/2007 23:00 -20.8224 217.70115 STM07.2 19/02/2007
>> 0:00 -37.4096 178.306 STM07.2 19/02/2007 12:00 -37.3752 178.336 STM07.2 20/02/2007
>> 0:00 -37.277 178.378 STM07.2 20/02/2007 12:00 -37.1786 178.3845 STM07.2 21/02/2007
>> 0:00 -37.076 178.389 STM07.2 21/02/2007 12:00 -36.9197 178.3965 STM07.2 22/02/2007
>> 0:00 -36.7435 178.391 STM07.2 22/02/2007 12:00 -36.4925 178.509 STM07.2 23/02/2007
>> 0:00 -36.2415 178.575 STM07.2 23/02/2007 12:00 -35.955 178.628
>>
>> d.x <- ifelse(c(test = df1$ID==df2$ID && abs(difftime(as.POSIXlt(df1$Date),
>> as.POSIXlt(df2$Date),
>> units="hours") < 12)), yes = delta.lon=df1$x-df2$x, no = "NA")
>>
>> I'm using R 2.7.1.
>>
>> Cheers,
>>
>> Tim
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list