[R] Converting monthly data to quarterly data

Gavin gavin.simpson at ucl.ac.uk
Mon Aug 18 20:38:29 CEST 2008


On Mon, 2008-08-18 at 14:31 +0100, Denise Xifara wrote:
> Thank you very much Stephen, but how will aggregate deal with months that
> fall outside annual quarters? eg, one extra month at the end of the dataset?

[Without your data I'm kind of guessing at the exact format and problem,
but the example below shows one way to deal with quarters that span
years so should be adaptable to your problem. You also have a different
idea of quarters to my ecological one...]

## first some dummy data
set.seed(12345)
dat <- data.frame(price = cumsum(rnorm(450)),
                  date = seq(from = as.Date("2008-01-01"),
                             length = 450, by = "days"))

## get the months of observations
dat$month <- factor(format(dat$date, format = "%b"), levels = month.abb)

## and then format this for the quarters
dat$quarter <- character(length = NROW(dat))
## I'm sure these 4 steps can be simplified but just
## how escapes me at the moment
dat$quarter[dat$month %in% month.abb[c(12,1,2)]] <- "Winter"
dat$quarter[dat$month %in% month.abb[c(3:5)]] <- "Spring"
dat$quarter[dat$month %in% month.abb[c(6:8)]] <- "Summer"
dat$quarter[dat$month %in% month.abb[c(9:11)]] <- "Autumn"
dat$quarter <- factor(dat$quarter, 
                      levels = c("Spring","Summer","Autumn","Winter"))

## look at the fruits of our labour
head(dat)

## create period
runs <- rle(as.numeric(dat$quarter))$lengths
dat$period <- factor(rep(seq_along(runs), times = runs))

## aggregate
with(dat, aggregate(price, list(quarter = quarter, period = period), 
                    FUN = mean))

I use the rle() function (run length encoding) calculate the number of
observations where the 'quarter' remains the same:

> rle(as.numeric(dat$quarter))
Run Length Encoding
  lengths: int [1:6] 60 92 92 91 90 25
  values : num [1:6] 4 1 2 3 4 1

The 'values' here are the numeric representation of the quarter factor.
The most interesting for us is the second 4 - this is the winter 2008/9.
I use the lengths to replicate a period number (1,2,...,n) the correct
number of times. Now we have the period correctly calculated, we just
aggregate by quarter and period to give the averages you want.

If you are working on months 1-3 as quarter 1, 4-6 as quarter 2 etc,
then it is much easier, just aggregate by quarter and year:

## copy the data above
dat2 <- dat
## change meaning of quarter
dat2$quarter <- character(length = NROW(dat2))
dat2$quarter[dat2$month %in% month.abb[c(1:3)]] <- "Q1"
dat2$quarter[dat2$month %in% month.abb[c(4:6)]] <- "Q2"
dat2$quarter[dat2$month %in% month.abb[c(7:9)]] <- "Q3"
dat2$quarter[dat2$month %in% month.abb[c(10:12)]] <- "Q4"
dat2$quarter <- factor(dat2$quarter, levels = c("Q1","Q2","Q3","Q4"))
## year variable
dat2$year <- factor(format(dat2$date, format = "%Y"))

## drop the first 40 days to simulate a late starting record
## and aggregate
with(dat2[-(1:40), ], aggregate(price, list(quarter = quarter, year =
year), FUN = mean))

Which gives:
  quarter year        x
1      Q1 2008 13.58644
2      Q2 2008 24.16523
3      Q3 2008 28.56004
4      Q4 2008 32.60900
5      Q1 2009 44.86594

Do these examples help solve your problem?

G
> 
> 2008/8/18 stephen sefick <ssefick at gmail.com>
> 
> > ?aggregate
> > may do what you want
> >
> > On Mon, Aug 18, 2008 at 8:19 AM, Denise Xifara
> > <dionysia-kiara.xifaras at st-hildas.ox.ac.uk> wrote:
> > > Dear R users,
> > >
> > > I have a dataframe where column is has countries, column 2 is dates
> > > (monthly) for each countrly, the next 10 columns are my factors where I
> > have
> > > measurements for each country and  for each date.  I have attached a
> > sample
> > > of the data in csv format with the data for 3 countries.
> > >
> > > I would like to convert my monthly data into quarterly data, finding the
> > > mean over 3 month periods for factors a-i, and the sum for factor j.  My
> > > problem is that not all countries have starting date at the beginning of
> > a
> > > quarter for a particular year, ie some countries start in May or
> > September,
> > > and also some countries have one extra month, some have two extra months
> > so
> > > there's no way of deleting some rows with a simple command (I want to get
> > > rid of all extra data that does not fall into the quarters for each
> > > country), since the amount of data to get rid of for each country varies.
> > >
> > > I tried for example:
> > > i=1
> > > denise<-data[((data$country)==unique(data$country[i]),]
> > > denise[,2]<- as.Date(denise$date, "%Y-%m-%d")
> > > denise2<-denise[order(denise[,2],decreasing=FALSE),]
> > > len<-length(denise[,1])
> > > limit<-floor(len/3)+1
> > > splitter<-rep(1:limit,each=3)
> > > spl.dat<-split(denise2,splitter)
> > > new.data<-as.matrix(lapply(spl.dat,FUN="mean"))
> > >
> > > This finds the mean every 3 rows but this doesnt consider the data
> > quarterly
> > > in a calendar sense.  ie if the data starts in november, it doesnt
> > discard
> > > the data for november, december and start calculating the means from
> > january
> > > onwards, until the month where the last quarter finishes, discarding any
> > > extra month, or two months at the end.
> > >
> > > I tried converting my data frame/matrix to a time series but the dates
> > are
> > > not kept.  I got:
> > >
> > >>tser<-as.ts(denise)
> > > Warning message:
> > > In data.matrix(data) : class information lost from one or more columns
> > >
> > > and column 2 has become a list of numbers rather than dates.
> > >
> > > I tried:
> > >
> > >> library(fCalendar)
> > >> den.tseries<-as.timeSeries(denise)
> > > Warning messages:
> > > 1: In .whichFormat(charvec, ...) : Could not determine time(date) format
> > > 2: In .whichFormat(charvec, ...) : Could not determine time(date) format
> > >> is.timeSeries(den.tseries)
> > > [1] TRUE
> > >> apply.quarterly(den.tseries,FUN="mean")
> > >                   data
> > > 1970-01-01 -2.425000000
> > > 1970-04-01 -0.557961111
> > > 1970-04-28  0.009814815
> > >
> > > Here, it calculates things quarterly but the the as.timeSeries command
> > has
> > > assigned its own daily dates to the data, instead of keeping my monthly
> > > dates.  Also, I don't understand how it deals with the extra dates.
> > >
> > > Sorry for the long email,
> > > Any help would be very much appreciated,
> > > Kind regards,
> > > Denise
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> >
> >
> > --
> > Let's not spend our time and resources thinking about things that are
> > so little or so large that all they really do for us is puff us up and
> > make us feel like gods. We are mammals, and have not exhausted the
> > annoying little problems of being mammals.
> >
> >        -K. Mullis
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list