[R] Importing Time Series Data for an R Beginner

Cedrick W. Johnson cedrick at cedrickjohnson.com
Thu Mar 11 21:43:49 CET 2010


Actually I just learned something myself that you can do on the dataset 
*without* the additional step in Excel.. I changed the format in 
strptime to match the format (d'oh!!!!) and whala:

  x
   Subject      Date     Time Value
1       1 7/23/2003 13:05:00    84
2       1 7/23/2003 13:10:00    87
3       1 7/23/2003 13:15:00    95
4       2 9/25/2004 14:34:00    95
5       2 9/25/2004 14:39:00    81
6       2 9/25/2004 14:44:00    93
7       3  3/2/2004 16:34:00    72
8       3  3/2/2004 16:39:00    67
9       3  3/2/2004 16:44:00    83
 > dates = as.POSIXct(strptime(paste(x[,2], x[,3], sep=" "), 
format="%m/%d/%Y %H:%M:%S"))
 > dates
[1] "2003-07-23 13:05:00 EDT" "2003-07-23 13:10:00 EDT" "2003-07-23 
13:15:00 EDT"
[4] "2004-09-25 14:34:00 EDT" "2004-09-25 14:39:00 EDT" "2004-09-25 
14:44:00 EDT"
[7] "2004-03-02 16:34:00 EST" "2004-03-02 16:39:00 EST" "2004-03-02 
16:44:00 EST"

 > data = xts(x[,c(1,4)], order.by=dates)
 > data
                     Subject Value
2003-07-23 13:05:00       1    84
2003-07-23 13:10:00       1    87
2003-07-23 13:15:00       1    95
2004-03-02 16:34:00       3    72
2004-03-02 16:39:00       3    67
2004-03-02 16:44:00       3    83
2004-09-25 14:34:00       2    95
2004-09-25 14:39:00       2    81
2004-09-25 14:44:00       2    93



hth,
c

ps: my first message didn't make it to the list... apparently i had a 
bad header??
=============================
Cedrick W. Johnson
aolim) cedrickjcvgr
www.cedrickjohnson.com
New York - Chicago


On 3/11/2010 3:34 PM, Cedrick W. Johnson (CJ) wrote:
> Hi Clay-
>
> You may want to look at both the XTS package, in addition to 'strptime'
> and 'as.POSIXct'
>
> When I get datasets in Excel, what I normally do is change the date
> (column) format to YYYY-mm-dd.. But that's due to my own shortcomings
> with date formatting in R.
>
> Here's a quick example:
>
>  > x = read.csv('TestData.csv')
>  > x
> Subject Date Time Value
> 1 1 2003-07-23 13:05:00 84
> 2 1 2003-07-23 13:10:00 87
> 3 1 2003-07-23 13:15:00 95
> 4 2 2004-09-25 14:34:00 95
> 5 2 2004-09-25 14:39:00 81
> 6 2 2004-09-25 14:44:00 93
> 7 3 2004-03-02 16:34:00 72
> 8 3 2004-03-02 16:39:00 67
> 9 3 2004-03-02 16:44:00 83
>
> dates = as.POSIXct(strptime(paste(x[,2], x[,3], sep=" "),
> format="%Y-%m-%d %H:%M:%S"))
>
>
>  > dates
> [1] "2003-07-23 13:05:00 EDT" "2003-07-23 13:10:00 EDT" "2003-07-23
> 13:15:00 EDT"
> [4] "2004-09-25 14:34:00 EDT" "2004-09-25 14:39:00 EDT" "2004-09-25
> 14:44:00 EDT"
> [7] "2004-03-02 16:34:00 EST" "2004-03-02 16:39:00 EST" "2004-03-02
> 16:44:00 EST"
>
>  > data = xts(x[,c(1,4)], order.by=dates)
>  > data
> Subject Value
> 2003-07-23 13:05:00 1 84
> 2003-07-23 13:10:00 1 87
> 2003-07-23 13:15:00 1 95
> 2004-03-02 16:34:00 3 72
> 2004-03-02 16:39:00 3 67
> 2004-03-02 16:44:00 3 83
> 2004-09-25 14:34:00 2 95
> 2004-09-25 14:39:00 2 81
> 2004-09-25 14:44:00 2 93
>
>
> HTH
>
> -cedrick
>
> =============================
> Cedrick Johnson
> aolim) cedrickjcvgr
> www.cedrickjohnson.com
> New York - Chicago
>
>
> On 3/11/2010 3:13 PM, Clay Heaton wrote:
>> Hi, I'm trying to learn R for a project I'm working on. I know several
>> programming languages, so I'm comfortable with the syntax. What I
>> can't figure out is how to import the file of time series data that I
>> have and parse it into individual series. The data was given to me in
>> Excel, but I can output it to tab-delimited or csv. I've been able to
>> pull in the entire table with read.table(), but I can't figure out how
>> to parse it into distinct groups.
>>
>> It looks like this:
>>
>> Subject Date Time Value
>> 1 7/23/03 13:05:00 84
>> 1 7/23/03 13:10:00 87
>> 1 7/23/03 13:15:00 95
>> ....
>> 1 9/25/04 14:34:00 95
>> 1 9/25/04 14:39:00 81
>> 1 9/25/04 14:44:00 93
>> ...
>> 2 3/02/04 16:34:00 72
>> 2 3/02/04 16:39:00 67
>> 2 3/02/04 16:44:00 83
>> ...
>> 2 3/21/05 11:15:00 121
>> 2 3/21/05 11:20:00 125
>> 2 3/21/05 11:25:00 120
>> ...
>>
>> There are ~ 100,000 rows of data. There are 86 subjects and each of
>> them have multiple traces. For each trace, the times are in uniform
>> increments of 5 minutes. Some subjects have multiple traces, some have
>> a single trace. Some traces include up to 500 values and others only 40.
>>
>> For now, what I'm looking to do is to be able to generate summary
>> statistics for each trace, and then for each subject. Hence, I need a
>> way to aggregate by value or subject, where the criteria for
>> aggregating traces are that the values were collected on the same day
>> and all are within 5 minutes of each other. I would like to be able to
>> iterate through the data to plot each trace independently.
>>
>> Any suggestions to help me get started would be appreciated. I'm
>> looking to learn, so I'd appreciate pointers to good tutorials or code
>> examples of dealing with time series data.
>>
>> Thanks!
>> Clay
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list