[R] Query about calculating the monthly average of daily data columns

Fri Sep 13 12:54:20 CEST 2019

Hi Subhamitra,
I'll try to write my answers adjacent to your questions below.

On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <subhamitra.patra using gmail.com>
wrote:

> Dear Sir,
>
> Thank you very much for your suggestion.
>
> Yes, your suggested code worked. But, actually, I have data from 3rd
> January 1994 to 3rd August 2017 for very large (i.e. for 84 countries)
> sample. From this, I have given the example of the years up to 2000. Before
> applying the same code for the long 24 years, I want to learn the logic
> behind the code. Actually, some part of the code is not understandable to
> me which I mentioned in the bold letter as follows.
>
> "spdat<-data.frame(
>   dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
>
> A. Here, I need to define the no. of days in a month, and the no. of
> countries name separately, right? But, what is meant by 15:50, and 10:45 in
> return A, and B respectively?
>

To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the real
values of return are, so I made them up using the "sample" function.
However, this is not meant to mislead anyone, just to show how whatever
numbers are in your data can be used in calculations. The colon (":")
operator creates a sequence of numbers starting with the one to the left
and ending with the one to the right.

>
> "# if you only have to get the monthly averages, it can be done this way
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
>
> B. Here, I need to define the no. of months, and years separately, right?
> or else what 2, and 3 (in bold) indicates?
>

To get the grouping variable of sequential months that you want, you only
need the month and year values of the dates in the first column. First I
used the "strsplit" function to split the date field at the hyphens, then
used "sapply" to extract ("[") the second (month) and third (year) parts as
two new columns. Because you have more than one year of data, you need the
year values or you will group all Januarys, all Februarys and so on. Notice
how I pass both of the new columns as a list (a data frame is a type of
list) in the call to get the mean of each month.

>
> "# get the averages by month and year - is this correct?
> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
>
> C. From this part, I got the exact average values of both January and
> February of 1994 for country A, and B. But, in code, I have a query that I
> need to define  spdat$returnA, and  spdat$returnB separately before writing
> this code, right? Like this, I need to define for each 84 countries
> separately with their respective number of months, and years before writing
> this code, right?
>

I don't think so. Because I don't know what your data looks like, I am
guessing that for each row, it has columns for each of the 84 countries. I
don't know what these columns are named, either. Maybe:

date             Australia   Belarus   ...    Zambia
01/01/1994   20             21                 22
...

> Yes, after obtaining the monthly average for each country's data, I need
> to use them for further calculations. So, I want to export the result to
> excel. But, until understanding the code, I think I willn't able to apply
> for the entire sample, and cannot be able to discuss the format of the
> resulted column to export to excel.
>

Say that we perform the grouped mean calculation for the first two country
columns like this:
monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
monmeans
    Australia  Belarus
[1,]  29.70000 30.43333
[2,]  34.17857 27.39286

We are presented with a 2x2 matrix of monthly means in just the format
someone might use for importing into Excel. The first row is January 1994,
the second February 1994 and so on. By expanding the columns to include all
the countries in your data, You should have the result you want.

Jim

	[[alternative HTML version deleted]]