[R] Best way to do temporal joins in R?

Jonathan Greenberg greenberg at ucdavis.edu
Tue Mar 17 00:41:38 CET 2009


Weird -- the  email was sent through my gmail account, looks like the 
.csvs got intercepted somewhere along the way.  At any rate, I placed 
them on a website:

http://cstars.ucdavis.edu/~jongreen/temp/temporal_join_R/

--j

Gabor Grothendieck wrote:
> There was nothing attached.
>
> On Mon, Mar 16, 2009 at 3:11 PM, Jonathan Greenberg
> <greenberg at ucdavis.edu> wrote:
>   
>> Sorry for the immediate follow-up, but Phil Spector correctly reminded me
>> this is a lot easier for the community I provide some sample data, so I'm
>> attaching 3 small CSVs to this email:
>>
>> species_data_Rexample.csv contains the "field data" (which species was ID'd
>> and what time it was ID'd),
>> temperature_data_Rexample.csv contains the date, time, station ID and the
>> temperature "value"
>>
>> I'd like a dataframe which contains for each unique line in
>> species_data_Rexample.csv, a series of lines, one per station, and the
>> temperature of the nearest time stamp, or an interpolated value (weighted
>> average would be fine, but so would just grabbing the nearest value), so for
>> this example I'd like something that looks like the csv
>> "fused_data_Rexample.csv"
>>
>> Thanks!
>>
>> --j
>>
>> Jonathan Greenberg wrote:
>>     
>>> I've been playing with zoo a bit, and it seems ok except it doesn't
>>> support non-unique time stamps when performing joins.  I have two databases
>>> which contain a dataframe of a Date object (with the time, not just
>>> MM/DD/YY), e.g.:
>>>
>>> DB 1:
>>> UniqueID,Date1,Data 1,Data 2
>>>
>>> DB 2:
>>> Date2, Station, Data 3
>>>
>>> We'll say Station can contain three values: A,B and C
>>>
>>> DB 1 may have some repeat times, and DB 2 definitely has them -- although
>>> each Date, Station combo is unique (this DB contains weather data collected
>>> on the half-hour or fifteen minute interval at a set of stations).  I'd like
>>> DB2's station and Data3 to be joined with DB1 based on the nearest time
>>> stamp (interpolating Data3 or not).
>>>
>>> Ideally, I'd like a fused database such that I get for each uniqueID in
>>> DB1:
>>>
>>> UniqueID,Date,Data1,Data2,Station,Data3
>>>
>>> Thoughts?  Hints?
>>>
>>> --j
>>>
>>>
>>>
>>>       
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>     

-- 

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Cell: 415-794-5043
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307




More information about the R-help mailing list