[R] R - Aggregate 3-Hourly Block Data into Weekly (Melt)
Shouro Dasgupta
shouro at gmail.com
Fri Dec 19 11:01:52 CET 2014
Thank you very much for your reply. I really appreciate it. I apologize for
the HTML version, I have made modifications and replied to your
questions/comments below. Thanks again
tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 = c(
285.5533142,
285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071),
X2026.01.01.7 = c(282.3938, 282.3938, 283.1096), X2026.01.01.8 = c(285.9133,
285.9133, 286.1951) .Names = c("FIPS", "X2026.01.01.1", "X2026.01.01.2",
"X2026.01.01.3", "X2026.01.01.4",
"X2026.01.01.5", "X2026.01.01.6", "X2026.01.01.7", "X2026.01.01.8" ),
class = "data.frame", row.names = c(NA, -3L))
*Looks like this*
FIPS
X2026.01.01.1
X2026.01.01.2
X2026.01.01.3
X2026.01.01.4
X2026.01.01.5
X2026.01.01.6
X2026.01.01.7
X2026.01.01.8
1001
285.5533
283.4977
281.9734
280.0235
278.7126
278.5205
282.3938
285.9133
1003
285.5533
283.4977
281.9734
280.0235
278.7126
278.5205
282.3938
285.9133
1005
286.2481
285.086
284.1549
282.6076
281.2554
280.6148
283.1096
286.1951
For X2026.01.01.1 represents Year=2026, Month=01, Day=01, Hour block=1.
I have extracted the data by FIPS code and reshaped the yearly data files
using melt();
for (i in filelist) {
tmp1 <- as.data.table(read.csv(i,header=T, sep=","))
tmp2 <- melt(tmp1, id="FIPS")
tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
}
I have added date string and weekdays using the following codes:
*Date Variable*
tmp2$date <- with(tmp2, ymd(sprintf('%04d%02d%02d', year, month,
day)))
*Day Variable*
tmp2$day <- weekdays(as.Date(tmp2$date))
*Question 1: *Apologies for clipping the data and not showing all the hour
blocks. I have included a full 8-hour block now. For each year, I have
3-hour blocks for every day.
*Question 2: *I have two time periods for each GCM; 2026-2045 and
2081-2100. There are occasions when days would have 7 hour blocks instead
of 8, it could be a data reporting issue from the models.
*Reply to Comment 1: *The data has been downscaled to the US from gridded
data; Resolution: T42 in atm. 1/3~1ºlat. x 1ºlon. tripolar grids in ocn. So
Daylight Savings Time could well be an issue. I will look into it.
*Reply to Comment 2: * Thank you for the suggestion. I realize that using
rep() is risky, however, I have assign week numbers to each FIPS code (ID)
for each year. I was thinking of something similar to this:
weeks<-rep(seq(1,52,1),each=(unique(tmp2$FIPS)**8*)
Any alternative code will be highly appreciated. My first goal is to
subset/clip the data to begin on the first Monday and end on the last
Sunday of each year.
On Fri, Dec 19, 2014 at 9:37 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:
>
> Thank you for attempting to convey your problem clearly using example
> code... but your use of HTML email has very nearly undone all your
> efforts. Also, use of "dput" to make an R-readable block of data is more
> reliable than read.table to get the data into our R sessions quickly.
>
> First question: you say these are three-hour results, but there are only
> six per day in your example.
>
> Second question: you say the "number of blocks vary between 7 and 8", but
> your example data does not illustrate that problem. (If there were 8 blocks
> per day the three-hour statement would make more sense.)
>
> Comment: You have not mentioned timezone information... this information
> looks ripe for GMT, but if that is a bad assumption then daylight savings
> might account for some variations in blocks per day.
>
> Comment: I think your plan of using rep to identify the week numbers is
> risky. I would recommend using a Date or POSIXt type to make the timestamps
> computable, and then find the date corresponding to the beginning of the
> week that the timestamp falls into. Then aggregate grouping on those time
> values. Unfortunately, the specific way you go about identifying the
> beginning of week may depend on the timezone information.
>
>
> On Thu, 18 Dec 2014, Shouro Dasgupta wrote:
>
> I am trying to compute max, min, and mean from Global Circulation Models
>> (GCM) for the US. The data is in 3-hour blocks for 2026-2045 and
>> 2081-2100.
>> Sample Data:
>>
>> tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 =
>> c(285.5533142,
>> 285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
>> 285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
>> ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
>> X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
>> X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071)),
>> .Names = c("FIPS",
>> "X2026.01.01.1", "X2026.01.01.2", "X2026.01.01.3", "X2026.01.01.4",
>> "X2026.01.01.5", "X2026.01.01.6"), class = "data.frame", row.names =
>> c(NA,
>> -3L))
>>
>> I have extracted the data by FIPS code and reshaped the yearly data files
>> using melt();
>>
>> for (i in filelist) {
>> tmp1 <- as.data.table(read.csv(i,header=T, sep=","))
>> tmp2 <- melt(tmp1, id="FIPS")
>> tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
>> tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
>> tmp2$day <- as.numeric(substr(tmp2$variable,10,11))}
>>
>> I have added datestring and weekdays using the following code:
>> Inserting Date Variable
>>
>> tmp2$date <- with(tmp2, ymd(sprintf('%04d%02d%02d', year, month, day)))
>>
>> Inserting Day Variable
>>
>> tmp2$day <- weekdays(as.Date(tmp2$date))
>>
>> sample.tmp2 <- "FIPS xdate temp year month day
>> date dates weekdays
>> + 5599311 1003 X2045.08.14.2 304.5995 2045 8 Monday 2045-08-14
>> 2036-01-29 2
>> + 468406 39093 X2045.01.19.7 267.8483 2045 1 Thursday 2045-01-19
>> 2028-06-04 0
>> + 5022078 21167 X2045.07.21.8 314.6772 2045 7 Friday 2045-07-21
>> 2035-09-13 4
>> + 186822 9005 X2045.01.08.5 269.0803 2045 1 Sunday 2045-01-08
>> 2037-06-28 0
>> + 3998678 13295 X2045.06.10.7 307.2408 2045 6 Saturday 2045-06-10
>> 2033-10-13 4"
>>
>> Data <- read.table(text=sample.tmp2, header = TRUE)
>>
>> My goal is to aggregate these 3-hourly blocks into weekly data, however,
>> GCM data is not consistent and the blocks vary between 7 and 8. I want to
>> clip the data to start on the first Monday of 2026 and end on the last
>> Sunday of 2045 and then use rep() to assign week numbers for the whole
>> epoch.
>>
>> I know I can count the number of each day using something like this;
>>
>> length(which(weekdays == '0'))
>>
>> Where 0, 1, 2..., 6 represent Sunday, Monday,...
>>
>> My question is am I doing anything wrong in trying to aggregate the data
>> to
>> begin with? But importantly, I would be grateful for any help to clip the
>> dataset to begin on the first Monday and end on the last Sunday. Thank you
>> very much!
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ------------------------------------------------------------
> ---------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
> Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ------------------------------------------------------------
> ---------------
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list