[R] Melt and Rbind/Rbindlist

Shouro Dasgupta shouro at gmail.com
Sun Feb 1 10:14:10 CET 2015


Hello Mr. Holtman,

Thank you very much for your reply and suggestion. This is what each Year's
data looks like;

tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 =
> c(285.5533142,
>   285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
>   285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
>   ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
>       X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
>       X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071)), .Names =
> c("FIPS",
>   "X2026.01.01.1", "X2026.01.01.2", "X2026.01.01.3", "X2026.01.01.4",
>   "X2026.01.01.5", "X2026.01.01.6"), class = "data.frame", row.names =
> c(NA,
>   -3L))


The data is in 3-hour blocks for every day by US FIPS code from 2026-2045,
each year's data is in a difference csv. My goal is to to compute max, min,
and mean by week and month. I used the following code to assign week
numbers to the observations;

nweek <- function(x, format="%Y-%m-%d", origin){
>     if(missing(origin)){
>         as.integer(format(strptime(x, format=format), "%W"))
>     }else{
>         x <- as.Date(x, format=format)
>         o <- as.Date(origin, format=format)
>         w <- as.integer(format(strptime(x, format=format), "%w"))
>         2 + as.integer(x - o - w) %/% 7
>     }
> }
>

 Then the following;

for (i in filelist) {
> nweek(tmp2$date)
> }
> for (i in filelist) {
> nweek(dates, origin="2026-01-01")
> }
> for (i in filelist) {
> wkn<-nweek(tmp2$date)
> }


Is this efficient? Thank you so much again. I really appreciate it.

Sincerely,

Shouro

On Sun, Feb 1, 2015 at 1:22 AM, jim holtman <jholtman at gmail.com> wrote:

> It would have been nice if you had at least supplied a subset (~10 lines)
> from a couple of files so we could see what the data looks like and test
> out any solution. Since you are using 'data.table', you should probably
> also use 'fread' for reading in the data.  Here is a possible approach of
> reading the data into a list and then creating a single, large data.table:
>
> -------
> myDTs <- lapply(filelist, function(.file) {
>   tmp1 <- fread(.file, sep=",")
>   tmp2 <- melt(tmp1, id="FIPS")
>   tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
>   tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
>   tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
>   tmp2  # return value
> })
>
> bigDT <- rbindlist(myDTs)  # rbind all the data.tables together
>
> # then you should be able to do:
>
> mean.temp <- bigDT[, list(temp.mean=lapply(.SD, mean),
>        by=c("FIPS","year","month"), .SDcols=c("temp")]
>
>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Sat, Jan 31, 2015 at 5:57 PM, Shouro Dasgupta <shouro at gmail.com> wrote:
>
>> I have climate data for 20 years for US counties (FIPS) in csv format,
>> each
>> file represents one year of data. I have extracted the data and reshaped
>> the yearly data files using melt();
>>
>> for (i in filelist) {
>> >   tmp1 <- as.data.table(read.csv(i,header=T, sep=","))
>> >   tmp2 <- melt(tmp1, id="FIPS")
>> >   tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
>> >   tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
>> >   tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
>> > }
>>
>>
>> Should I *rbind *in the loop here as I have the memory?
>> So, the file (i) tmp2 looks like this:
>>
>> FIPS  temp year month  date
>> > 1001 276.7936 2045 1 1/1/2045
>> > 1003 276.7936 2045 1 1/1/2045
>> > 1005 279.6452 2045 1 1/1/2045
>> > 1007 276.7936 2045 1 1/1/2045
>> > 1009 272.3748 2045 1 1/1/2045
>> > 1011 279.6452 2045 1 1/1/2045
>>
>>
>> My goal is calculate the mean by FIPS code by month/week, however, when I
>> use the following code, I get a NULL value.
>>
>> mean.temp<- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean),
>> > by=c("FIPS","year","month"), .SDcols=c("temp")]}
>>
>>
>> This works fine for individual years but with *for (i in filelist)*. What
>> am I doing wrong? Can include a rbind/bindlist in the loop to make a big
>> data.frame? Any suggestions will be highly appreciated. Thank you.
>>
>> Sincerely,
>>
>> Shouro
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list