[R] Arrange data

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Aug 4 23:45:00 CEST 2020


Hello,

Please keep cc-ing the list R-help is threaded and questions and answers 
might be of help to others in the future.

As for the question, see if the following code does what you want.
First, create a logical index i of the months between 7 and 3 and use 
that index to subset the original data.frame. Then, a cumsum trick gives 
a vector M defining the data grouping. Group and compute the Value means 
with aggregate. Finally, since each group spans a year border, create a 
more meaningful Years column and put everything together.

df1 <- read.csv("mddat.csv")

i <- with(df1, (Month >= 7 & Month <= 12) | (Month >= 1 & Month <= 3))
df2 <- df1[i, ]
M <- cumsum(c(FALSE, diff(as.integer(row.names(df2))) > 1))

agg <- aggregate(Value ~ M, df2, mean)
Years <- sapply(split(df2$Year, M), function(x){paste(x[1], 
x[length(x)], sep = "-")})
final <- cbind.data.frame(Years, Value = agg[["Value"]])

head(final)
#      Years    Value
#0 1975-1975 87.00000
#1 1975-1976 89.44444
#2 1976-1977 85.77778
#3 1977-1978 81.55556
#4 1978-1979 71.55556
#5 1979-1980 75.77778


Hope this helps,

Rui Barradas



Às 20:44 de 04/08/20, Md. Moyazzem Hossain escreveu:
> Dear Rui,
> 
> Thanks a lot for your help.
> 
> It is working. Now I am also trying to find the average of values for 
> *July 1975 to March 1976* and record as the value of the year 1975. 
> Moreover, I want to continue it up to the year 2017. You may check the 
> attached file for data (mddat.csv).
> 
> I use the following function but got error
> aggregate(Value ~ Year, data = subset(df1, Month >= 7 & Month <= 3), FUN 
> = mean)
> 
> Please help me again. Thanks in advance.
> 
> Best Regards,
> Md
> 
> On Mon, Aug 3, 2020 at 11:28 PM Rui Barradas <ruipbarradas using sapo.pt 
> <mailto:ruipbarradas using sapo.pt>> wrote:
> 
>     Hello,
> 
>     And here is another way, with aggregate.
> 
>     Make up test data.
> 
>     set.seed(2020)
>     df1 <- expand.grid(Year = 2000:2018, Month = 1:12)
>     df1 <- df1[order(df1$Year),]
>     df1$Value <- sample(20:30, nrow(df1), TRUE)
>     head(df1)
> 
> 
>     #Use subset to keep only the relevant months
>     aggregate(Value ~ Year, data = subset(df1, Month <= 7), FUN = mean)
> 
> 
>     Hope this helps,
> 
>     Rui Barradas
> 
>     Às 12:33 de 03/08/2020, Rasmus Liland escreveu:
>      > On 2020-08-03 21:11 +1000, Jim Lemon wrote:
>      >> On Mon, Aug 3, 2020 at 8:52 PM Md. Moyazzem Hossain
>     <hossainmm using juniv.edu <mailto:hossainmm using juniv.edu>> wrote:
>      >>> Hi,
>      >>>
>      >>> I have a dataset having monthly
>      >>> observations (from January to
>      >>> December) over a period of time like
>      >>> (2000 to 2018). Now, I am trying to
>      >>> take an average the value from
>      >>> January to July of each year.
>      >>>
>      >>> The data looks like
>      >>> Year    Month  Value
>      >>> 2000    1         25
>      >>> 2000    2         28
>      >>> 2000    3         22
>      >>> ....    ......      .....
>      >>> 2000    12       26
>      >>> 2001     1       27
>      >>> .......         ........
>      >>> 2018    11       30
>      >>> 20118   12      29
>      >>>
>      >>> Can someone help me in this regard?
>      >>>
>      >>> Many thanks in advance.
>      >> Hi Md,
>      >> One way is to form a subset of your
>      >> data, then calculate the means by
>      >> year:
>      >>
>      >> # assume your data is named mddat
>      >> mddat2<-mddat[mddat$month < 7,]
>      >> jan2jun<-by(mddat2$value,mddat2$year,mean)
>      >>
>      >> Jim
>      > Hi Md,
>      >
>      > you can also define the period in a new
>      > column, and use aggregate like this:
>      >
>      >       Md <- structure(list(
>      >       Year = c(2000L, 2000L, 2000L,
>      >       2000L, 2001L, 2018L, 2018L),
>      >       Month = c(1L, 2L, 3L, 12L, 1L,
>      >       11L, 12L),
>      >       Value = c(25L, 28L, 22L, 26L,
>      >       27L, 30L, 29L)),
>      >       class = "data.frame",
>      >       row.names = c(NA, -7L))
>      >
>      >       Md[Md$Month %in%
>      >               1:6,"Period"] <- "first six months of the year"
>      >       Md[Md$Month %in% 7:12,"Period"] <- "last six months of the
>     year"
>      >
>      >       aggregate(
>      >         formula=Value~Year+Period,
>      >         data=Md,
>      >         FUN=mean)
>      >
>      > Rasmus
>      >
>      > ______________________________________________
>      > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>      > and provide commented, minimal, self-contained, reproducible code.
> 
> 
>     -- 
>     Este e-mail foi verificado em termos de vírus pelo software
>     antivírus Avast.
>     https://www.avast.com/antivirus
> 
>     ______________________________________________
>     R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
> 
> 
>



More information about the R-help mailing list