[R] dividing ts objects of different frequencies
Jeffrey J. Hallman
jhallman at frb.gov
Thu Mar 5 15:54:25 CET 2009
"Stephen J. Barr" <stephenjbarr at gmail.com> writes:
> I have two time series (ts) objects, 1 is yearly (population) and the
> other is quarterly (bankruptcy statistics). I would like to produce a
> quarterly time series object that consists of bankruptcy/population.
> Is there a pre-built function to intelligently divide these time
> series:
What you need to do is create a quarterly population series, then divide it into
your bankruptcy series. The only "nice" way I know to do this is to use the
convert() function from my "tis" package. Here is it's help document:
convert package:tis R Documentation
Time scale conversions for time series
Description:
Convert 'tis' series from one frequency to another using a variety
of algorithms.
Usage:
convert(x, tif, method = "constant", observed. = observed(x),
basis. = basis(x), ignore = F)
Arguments:
x: a univariate or multivariate 'tis' series. Missing values
(NAs) are ignored.
tif: a number or a string indicating the desired ti frequency of
the return series. See 'help(ti))' for details.
method: method by which the conversion is done: one of "discrete",
"constant", "linear", or "cubic". Note that this argument is
effectively ignored if 'observed.' is "high" or "low", as the
"discrete" method is the only one supported for that setting.
observed.: "observed" attribute of the input series: one of
"beginning", "end", "high", "low", "summed", "annualized", or
"averaged". If this argument is not supplied and
observed('x') != NULL it will be used. The output series
will also have this "observed" attribute.
basis.: "daily" or "business". If this argument is not supplied and
basis('x') != NULL it will be used. The output series will
also have this "basis" attribute.
ignore: governs how missing (partial period) values at the beginning
and/or end of the series are handled. For method ==
"discrete" or "constant" and ignore == T, input values that
cover only part the first and/or last output time intervals
will still result in output values for those intervals. This
can be problematic, especially for observed == "summed", as
it can lead to atypical values for the first and/or last
periods of the output series.
Details:
This function is a close imitation of the way FAME handles time
scale conversions. See the chapter on "Time Scale Conversion" in
the Users Guide to Fame if the explanation given here is not
detailed enough.
Start with some definitions. Combining values of a higher
frequency input series to create a lower frequency output series
is known as 'aggregation'. Doing the opposite is known as
'disaggregation'.
If observed == "high" or "low", the "discrete" method is always
used.
Disaggration for "discrete" series: (i) for observed ==
"beginning" ("end"), the first (last) output period that begins
(ends) in a particular input period is assigned the value of that
input period. All other output periods that begin (end) in that
input period are NA. (ii) for observed == "high", "low", "summed"
or "averaged", all output periods that end in a particular input
period are assigned the same value. For "summed", that value is
the input period value divided by the number of output periods
that end in the input period, while for "high", "low" and
"averaged" series, the output period values are the same as the
corresponding input period values.
Aggregation for "discrete" series: (i) for observed == "beginning"
("end"), the output period is assigned the value of the first
(last) input period that begins (ends) in the output period. (ii)
for observed == "high" ("low"), the output period is assigned the
value of the maximum (minimum) of all the input values for periods
that end in the output period. (iii) for observed == "summed"
("averaged"), the output value is the sum (average) of all the
input values for periods that end in the output period.
Methods "constant", "linear", and "cubic" all work by constructing
a continuous function F(t) and then reading off the appropriate
point-in-time values if observed == "beginning" or "end", or by
integrating F(t) over the output intervals when observed ==
"summed", or by integrating F(t) over the output intervals and
dividing by the lengths of those intervals when observed ==
"averaged". The unit of time itself is given by the 'basis'
argument.
The form of F(t) is determined by the conversion method. For
"constant" conversions, F(t) is a step function with jumps at the
boundaries of the input periods. If the first and/or last input
periods only partly cover an output period, F is linearly extended
to cover the first and last output periods as well. The heights
of the steps are set such that F(t) aggregates over the input
periods to the original input series.
For "linear" ("cubic") conversions, F(t) is a linear (cubic)
spline. The x-coordinates of the spline knots are the beginnings
or ends of the input periods if observed == "beginning" or "end",
else they are the centers of the input periods. The y-coordinates
of the splines are chosen such that aggregating the resulting F(t)
over the input periods yields the original input series.
For "constant" conversions, if 'ignore' == F, the first (last)
output period is the first (last) one for which complete input
data is available. For observed == "beginning", for example, this
means that data for the first input period that begins in the
first output period is available, while for observed == "summed",
this means that the first output period is completely contained
within the available input periods. If 'ignore' == T, data for
only a single input period is sufficient to create an output
period value. For example, if converting weekly data to monthly
data, and the last observation is June 14, the output series will
end in June if 'ignore' == T, or May if it is F.
Unlike the "constant" method, the domain of F(t) for "linear" and
"cubic" conversions is NOT extended beyond the input periods, even
if the ignore option is T. The first (last) output period is
therefore the first (last) one that is completely covered by input
periods.
Series with observed == "annualized" are handled the same as
observed == "averaged".
Value:
a 'tis' time series covering approximately the same time span as
'x', but with the frequency specified by 'tif'.
BUGS:
Method "cubic" is not currently implemented for observed "summed",
"annualized", and "averaged".
References:
Users Guide to Fame
See Also:
'aggregate', 'tif', 'ti'
Examples:
wSeries <- tis(1:105, start = ti(19950107, tif = "wsaturday"))
observed(wSeries) <- "ending" ## end of week values
mDiscrete <- convert(wSeries, "monthly", method = "discrete")
mConstant <- convert(wSeries, "monthly", method = "constant")
mLinear <- convert(wSeries, "monthly", method = "linear")
mCubic <- convert(wSeries, "monthly", method = "cubic")
## linear and cubic are identical because wSeries is a pure linear trend
cbind(mDiscrete, mConstant, mLinear, mCubic)
observed(wSeries) <- "averaged" ## weekly averages
mDiscrete <- convert(wSeries, "monthly", method = "discrete")
mConstant <- convert(wSeries, "monthly", method = "constant")
mLinear <- convert(wSeries, "monthly", method = "linear")
cbind(mDiscrete, mConstant, mLinear)
--
Jeff
More information about the R-help
mailing list