[R] Simple time series question with zoo

Fri Oct 28 01:38:34 CEST 2011

On Thu, Oct 27, 2011 at 4:18 PM, Vinny Moriarty <vwmoriarty at gmail.com> wrote:
> New user here. My goal is pull daily averages from a long dataset.
>
> I've been working with some code I got from this list from
>
> https://stat.ethz.ch/pipermail/r-help/2009-March/191302.html
>
>
> The code how I have been using it is as follows:
>
> library(zoo)
> library(chron)
>
> DB<-read.table("/Users/me/Desktop/R/data.csv", sep=",", header=TRUE, as.is
> =TRUE)
> z<-zoo(LTER6$temp, chron(LTER6$Date, LTER6$Time))
> z.day=aggregate(z, trunc, mean) #This last line gives me daily averages for
> my data
>
>
> Simple and elegant- and it works. Thanks to the author the hard part is
> over. But I plan to tweak it so I have some questions about why this works
>
> 1- The data I have has the date and time format as a single string like this
> "2006-04-09 10:20:00". But the code was set up to read the data in two
> columns  ie- "2006-04-09" & "10:20:00". Is this how the chrom package
> expects to have the data, or is there a way I can change the code to read
> the data as a single column. For now I am chopping up my date and time data
> manually before I run R.
>
> 2-  I've read the help on "as.is", and I'm not sure why I need that function
> in the first line of code. This is what my original data looks like (with
> header) if this helps answer this this question
>
> line.site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temp
> 06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63
> 06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56
>
> 3. Finally- how does the function "trunc" know to aggregate the data by day?
> If I wanted to do monthly averages I would need to specify with
> "as.yearmon", but I don't seem to need to specify "day" anywhere in the
> code.

That link is several years old.  Since then the zoo package has gained
additional capabilities. Assuming the 2nd field is the desired
date/time and the last field on each line is the one you want try this
read.zoo statement.   See ?read.zoo and also try:
vignette("zoo-read")

library(zoo)
library(chron)

# create test file
Lines <- "line.site,time_local,time_utc,reef_type_code,sensor_type,sensor_depth_m,temp
06,2006-04-09 10:20:00,2006-04-09 20:20:00,BAK,sb39, 2, 29.63
06,2006-04-09 10:40:00,2006-04-09 20:40:00,BAK,sb39, 2, 29.56"
cat(Lines, "\n", file = "data.txt")

# NULL fields are removed

temp <- read.zoo("data.txt", FUN = as.chron, header = TRUE, sep = ",",
	colClasses = c("NULL", NA, "NULL", "NULL", "NULL", "NULL", NA))

# daily
temp.day <- read.zoo("data.txt", FUN = as.Date, header = TRUE, sep = ",",
	aggregate = mean,
	colClasses = c("NULL", NA, "NULL", "NULL", "NULL", "NULL", NA))

# monthly
temp.ym <- read.zoo("data.txt", FUN = as.yearmon, header = TRUE, sep = ",",
	aggregate = mean,
	colClasses = c("NULL", NA, "NULL", "NULL", "NULL", "NULL", NA))

chron represents date/time internally as days since the Epoch +
fraction of day for the time.  Thus truncating to an integer removes
the fractional part (i.e. the time) leaving the day. See R News 4/1.
We could alternately just use the Date class in the base of R as shown
above.

If we had read in temp and wanted to aggregate it rather than read it
straight into an aggregated form then here are some possibilities:

aggregate(temp, trunc, mean) # daily
aggregate(temp, as.Date, mean) # daily with Date class
aggregate(temp, as.yearmon, mean)  # monthly

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com