[R-SIG-Finance] Imputing Missing Values

Sun Jun 26 15:16:41 CEST 2016

On Sun, 2016-06-26 at 12:53 +0000, Pankaj K Agarwal via R-SIG-Finance
wrote:
> This might be a very basic query for this erudite group. However, i am
> hopeful some help will be forthcoming nevertheless.I have a monthly
> time series of annualized t-bill rates on Indian markets. For some
> months, the values are missing randomly. I need to convert the
> annualized yields into daily as well as monthly yields. I have two
> questions:1. I am using package zoo. Which of the methods of NA
> imputations will be advisable for this series, viz., na.agggregate,
> na.locf, na.spline or na.approx etc.?2. Should the imputation be done
> on monthly annual yields and then the conversion to daily and monthly
> yields be performed or imputation be done afterwards?3. Are there
> better methods than above for this task? I will be extremely grateful
> for comments. Thanks a ton. Regards,Pankaj 	

The short answer is: Don't do it.  This is a bad idea.

You need to find a better source of data.  Daily data on Indian 3-mo
bill yields is widely available from free sources.  See, e.g.

http://www.investing.com/rates-bonds/india-3-month-bond-yield

There are many other sources of this data as well, I have no opinion on
the data quality of one over the other, but *any* of them would likely
be orders of magnitude better than what you're asking to do.

You can't impute from lower-frequency data to higher frequency data with
any confidence.  

These NA imputation methods are designed to fill some occasionally
missing data, or to do something like Last Observation Carried Forward
on Bid/Ask spreads (which is not imputation at all, since that is the
prevailing market). 

Basically, it always makes sense to start with the highest frequency
data available, and aggregate to lower frequencies.  In the case of your
query, start with the spot yield and do whatever adjustments you need.

Regards,

Brian