[R] R - Aggregate 3-Hourly Block Data into Weekly (Melt)

Fri Dec 19 11:01:52 CET 2014

Thank you very much for your reply. I really appreciate it. I apologize for
the HTML version, I have made modifications and replied to your
questions/comments below. Thanks again

tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 = c(
285.5533142,

  285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,

  285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767

  ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),

      X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),

      X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071),
X2026.01.01.7 = c(282.3938, 282.3938, 283.1096), X2026.01.01.8 = c(285.9133,
285.9133, 286.1951) .Names = c("FIPS", "X2026.01.01.1", "X2026.01.01.2",
"X2026.01.01.3", "X2026.01.01.4",

  "X2026.01.01.5", "X2026.01.01.6", "X2026.01.01.7", "X2026.01.01.8" ),
class = "data.frame", row.names = c(NA, -3L))

*Looks like this*

FIPS

X2026.01.01.1

X2026.01.01.2

X2026.01.01.3

X2026.01.01.4

X2026.01.01.5

X2026.01.01.6

X2026.01.01.7

X2026.01.01.8

1001

285.5533

283.4977

281.9734

280.0235

278.7126

278.5205

282.3938

285.9133

1003

285.5533

283.4977

281.9734

280.0235

278.7126

278.5205

282.3938

285.9133

1005

286.2481

285.086

284.1549

282.6076

281.2554

280.6148

283.1096

286.1951

For X2026.01.01.1 represents Year=2026, Month=01, Day=01, Hour block=1.

I have extracted the data by FIPS code and reshaped the yearly data files
using melt();

for (i in filelist) {

  tmp1 <- as.data.table(read.csv(i,header=T, sep=","))

  tmp2 <- melt(tmp1, id="FIPS")

  tmp2$year <- as.numeric(substr(tmp2$variable,2,5))

  tmp2$month <- as.numeric(substr(tmp2$variable,7,8))

  tmp2$day <- as.numeric(substr(tmp2$variable,10,11))

}

I have added date string and weekdays using the following codes:

*Date Variable*

        tmp2$date <- with(tmp2, ymd(sprintf('%04d%02d%02d', year, month,
day)))

*Day Variable*

        tmp2$day <- weekdays(as.Date(tmp2$date))

*Question 1: *Apologies for clipping the data and not showing all the hour
blocks. I have included a full 8-hour block now. For each year, I have
3-hour blocks for every day.

*Question 2: *I have two time periods for each GCM; 2026-2045 and
2081-2100. There are occasions when days would have 7 hour blocks instead
of 8, it could be a data reporting issue from the models.

*Reply to Comment 1: *The data has been downscaled to the US from gridded
data; Resolution: T42 in atm. 1/3~1ºlat. x 1ºlon. tripolar grids in ocn. So
Daylight Savings Time could well be an issue. I will look into it.

*Reply to Comment 2: * Thank you for the suggestion. I realize that using
rep() is risky, however, I have assign week numbers to each FIPS code (ID)
for each year. I was thinking of something similar to this:

weeks<-rep(seq(1,52,1),each=(unique(tmp2$FIPS)**8*)

Any alternative code will be highly appreciated. My first goal is to
subset/clip the data to begin on the first Monday and end on the last
Sunday of each year.

On Fri, Dec 19, 2014 at 9:37 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:
>
> Thank you for attempting to convey your problem clearly using example
> code... but your use of HTML email has very nearly undone all your
> efforts.  Also, use of "dput" to make an R-readable block of data is more
> reliable than read.table to get the data into our R sessions quickly.
>
> First question: you say these are three-hour results, but there are only
> six per day in your example.
>
> Second question: you say the "number of blocks vary between 7 and 8", but
> your example data does not illustrate that problem. (If there were 8 blocks
> per day the three-hour statement would make more sense.)
>
> Comment: You have not mentioned timezone information... this information
> looks ripe for GMT, but if that is a bad assumption then daylight savings
> might account for some variations in blocks per day.
>
> Comment: I think your plan of using rep to identify the week numbers is
> risky. I would recommend using a Date or POSIXt type to make the timestamps
> computable, and then find the date corresponding to the beginning of the
> week that the timestamp falls into. Then aggregate grouping on those time
> values. Unfortunately, the specific way you go about identifying the
> beginning of week may depend on the timezone information.
>
>
> On Thu, 18 Dec 2014, Shouro Dasgupta wrote:
>
>  I am trying to compute max, min, and mean from Global Circulation Models
>> (GCM) for the US. The data is in 3-hour blocks for 2026-2045 and
>> 2081-2100.
>> Sample Data:
>>
>> tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 =
>> c(285.5533142,
>>  285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
>>  285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
>>  ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
>>      X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
>>      X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071)),
>> .Names = c("FIPS",
>>  "X2026.01.01.1", "X2026.01.01.2", "X2026.01.01.3", "X2026.01.01.4",
>>  "X2026.01.01.5", "X2026.01.01.6"), class = "data.frame", row.names =
>> c(NA,
>>  -3L))
>>
>> I have extracted the data by FIPS code and reshaped the yearly data files
>> using melt();
>>
>> for (i in filelist) {
>>  tmp1 <- as.data.table(read.csv(i,header=T, sep=","))
>>  tmp2 <- melt(tmp1, id="FIPS")
>>  tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
>>  tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
>>  tmp2$day <- as.numeric(substr(tmp2$variable,10,11))}
>>
>> I have added datestring and weekdays using the following code:
>> Inserting Date Variable
>>
>> tmp2$date <- with(tmp2, ymd(sprintf('%04d%02d%02d', year, month, day)))
>>
>> Inserting Day Variable
>>
>> tmp2$day <- weekdays(as.Date(tmp2$date))
>>
>> sample.tmp2 <- "FIPS         xdate     temp year month      day
>> date      dates weekdays
>> + 5599311  1003 X2045.08.14.2 304.5995 2045     8   Monday 2045-08-14
>> 2036-01-29        2
>> + 468406 39093 X2045.01.19.7 267.8483 2045     1 Thursday 2045-01-19
>> 2028-06-04        0
>> + 5022078 21167 X2045.07.21.8 314.6772 2045     7   Friday 2045-07-21
>> 2035-09-13        4
>> + 186822   9005 X2045.01.08.5 269.0803 2045     1   Sunday 2045-01-08
>> 2037-06-28        0
>> + 3998678 13295 X2045.06.10.7 307.2408 2045     6 Saturday 2045-06-10
>> 2033-10-13        4"
>>
>> Data <- read.table(text=sample.tmp2, header = TRUE)
>>
>> My goal is to aggregate these 3-hourly blocks into weekly data, however,
>> GCM data is not consistent and the blocks vary between 7 and 8. I want to
>> clip the data to start on the first Monday of 2026 and end on the last
>> Sunday of 2045 and then use rep() to assign week numbers for the whole
>> epoch.
>>
>> I know I can count the number of each day using something like this;
>>
>> length(which(weekdays == '0'))
>>
>> Where 0, 1, 2..., 6 represent Sunday, Monday,...
>>
>> My question is am I doing anything wrong in trying to aggregate the data
>> to
>> begin with? But importantly, I would be grateful for any help to clip the
>> dataset to begin on the first Monday and end on the last Sunday. Thank you
>> very much!
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ------------------------------------------------------------
> ---------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ------------------------------------------------------------
> ---------------
>

	[[alternative HTML version deleted]]