[RsR] Seasonal adjustment; What's a good `blackbox' robust lm() function?

Ajay Shah @j@y@h@h @end|ng |rom m@y|n@org
Sun Jun 8 16:19:44 CEST 2008


There is no support for x12arima or Tramo/Seats in R. I am doing a
poor man's seasonal adjustment procedure: For monthly data, I do a
regression with no intercept and 12 dummy variables (i.e. a month
factor), and then calculating the residual.

If I did this using lm() the procedure is always feasible as long as
there are atleast 12 data points.

When I started using this, I noticed that quite a few time series have
extreme values and these were making the function generate wonky
values. So I thought I should switch to a robust lm(). I was surprised
to find that MASS:rlm() does not always converge. For quite a few
series I find that it breaks. I wonder what the gurus on this list
think. I can come back with concrete numerical examples of time-series
where MASS:rlm() does not work, if that's interesting.

More generally, if you have any suggestions on how to do seasonal
adjustment, they would be most interesting.

My code is:

# A vector of data "x" and a vector of dates "dates" is supplied.
# You must choose whether you want to focus on monthly or quarterly
# seasonality. The function will then do the work of mapping from
# "dates" into the relevant vector of dummy variables.
#
# x is assumed to be in levels.
# A dummy variable regression is done.
# NA values for x are correctly handled, but NA values for the
# dates are not tolerated.
# It is typical for you to send in c(NA,diff(log(levels))) in which case
#  the dummy variables regression is additive seasonality in log
#  differences.
# You get back a vector which has the same length as what came in.
# In addition, you also get back the results of the dummy variable
# regression.
library(MASS)

date2month <- function(dates) {
  factor(as.POSIXlt(dates)$mon, levels=0:11,
         labels=c("Jan", "Feb", "Mar", "Apr",
           "May", "Jun", "Jul", "Aug",
           "Sep", "Oct", "Nov", "Dec"))
}
date2quarter <- function(dates) {
  factor(c(1,1,1,2,2,2,3,3,3,4,4,4)[as.numeric(date2month(dates))],
         levels=1:4, labels=c("Q1","Q2","Q3","Q4"))
}

sa.dummyvariableregression <- function(x, dates, seasonality="monthly") {
  d <- data.frame(x=x, date=dates)
  if (seasonality == "monthly") {
    d$pfactor <- date2month(d$date)
  } else {
    d$pfactor <- date2quarter(d$date)
  }
  non.na.locations <- !is.na(d$x)
  d <- d[non.na.locations,]
  # At this point, we're holding the clean non-NA data.
  # Dummy variable regression: do rlm() if you can but use lm() otherwise.
  m <- rlm(x ~ -1 + pfactor, data=d)
  if (! m$converged) {
     cat("rlm failed to converge; switching to lm().\n")
     m <- lm(x ~ -1 + pfactor, data=d)
  }

  # Extract residuals
  sa <- m$residuals               # m$residuals is one short
  sa <- sa + mean(coef(m))        # Intercepts have soaked up the mean
  # Place back into the original layout with NAs in the right places
  sa.full <- rep(NA, length(non.na.locations))
  sa.full[non.na.locations] <- sa

  # Send back the goodies.
  list(sa=sa.full, m=m)
}

Here's some data to experiment with this. It's a data frame "n"
containing n$date and n$cpi.

n <- structure(list(date = structure(c(7395, 7425, 7456, 7486, 7517, 
    7548, 7578, 7609, 7639, 7670, 7701, 7729, 7760, 7790, 7821, 7851, 
    7882, 7913, 7943, 7974, 8004, 8035, 8066, 8095, 8126, 8156, 8187, 
    8217, 8248, 8279, 8309, 8340, 8370, 8401, 8432, 8460, 8491, 8521, 
    8552, 8582, 8613, 8644, 8674, 8705, 8735, 8766, 8797, 8825, 8856, 
    8886, 8917, 8947, 8978, 9009, 9039, 9070, 9100, 9131, 9162, 9190, 
    9221, 9251, 9282, 9312, 9343, 9374, 9404, 9435, 9465, 9496, 9527, 
    9556, 9587, 9617, 9648, 9678, 9709, 9740, 9770, 9801, 9831, 9862, 
    9893, 9921, 9952, 9982, 10013, 10043, 10074, 10105, 10135, 10166, 
    10196, 10227, 10258, 10286, 10317, 10347, 10378, 10408, 10439, 
    10470, 10500, 10531, 10561, 10592, 10623, 10651, 10682, 10712, 
    10743, 10773, 10804, 10835, 10865, 10896, 10926, 10957, 10988, 
    11017, 11048, 11078, 11109, 11139, 11170, 11201, 11231, 11262, 
    11292, 11323, 11354, 11382, 11413, 11443, 11474, 11504, 11535, 
    11566, 11596, 11627, 11657, 11688, 11719, 11747, 11778, 11808, 
    11839, 11869, 11900, 11931, 11961, 11992, 12022, 12053, 12084, 
    12112, 12143, 12173, 12204, 12234, 12265, 12296, 12326, 12357, 
    12387, 12418, 12449, 12478, 12509, 12539, 12570, 12600, 12631, 
    12662, 12692, 12723, 12753, 12784, 12815, 12843, 12874, 12904, 
    12935, 12965, 12996, 13027, 13057, 13088, 13118, 13149, 13180, 
    13208, 13239, 13269, 13300, 13330, 13361, 13392, 13422, 13453, 
    13483, 13514, 13545, 13573, 13604, 13634, 13665, 13695, 13726, 
    13757, 13787, 13818, 13848, 13879, 13910, 13939, 13970), class = "Date"), 
        cpi = c(38.88, 39.31, 39.96, 40.82, 41.04, 41.25, 42.12, 
        42.76, 42.98, 43.63, 43.63, 43.41, 43.63, 44.06, 45.14, 46.22, 
        46.87, 47.73, 48.16, 48.6, 48.6, 49.24, 49.46, 49.46, 49.89, 
        50.54, 50.97, 52.27, 52.27, 52.48, 52.7, 52.7, 52.48, 52.05, 
        52.27, 52.48, 52.92, 53.13, 54, 54.64, 55.29, 55.94, 56.59, 
        57.24, 57.02, 56.8, 57.24, 57.67, 58.1, 58.75, 59.83, 60.69, 
        61.34, 62.2, 62.42, 62.85, 62.42, 62.42, 62.85, 63.28, 63.71, 
        64.79, 66.09, 67.6, 68.03, 68.47, 68.9, 69.33, 68.47, 68.03, 
        68.25, 68.9, 69.98, 70.84, 71.92, 73.22, 74.08, 74.3, 74.73, 
        75.38, 75.59, 75.59, 75.59, 75.81, 76.46, 76.03, 76.67, 77.32, 
        77.54, 77.97, 78.83, 79.05, 80.35, 82.94, 82.51, 82.07, 82.72, 
        84.02, 86.18, 88.77, 89.2, 90.71, 93.52, 94.6, 92.66, 90.71, 
        89.63, 89.42, 89.63, 90.5, 90.71, 91.58, 92.01, 92.66, 94.38, 
        94.6, 93.09, 93.09, 92.87, 93.74, 94.6, 95.03, 95.46, 96.11, 
        95.68, 95.9, 96.98, 97.19, 96.33, 96.11, 95.68, 96.11, 96.76, 
        97.41, 98.7, 100, 100.65, 100.43, 101.08, 101.94, 101.3, 
        100.86, 100.65, 101.08, 101.3, 101.94, 102.81, 103.89, 104.54, 
        104.75, 105.18, 105.62, 104.54, 104.32, 104.54, 105.18, 106.48, 
        106.7, 107.34, 108.21, 107.78, 107.78, 108.64, 108.86, 108.42, 
        108.86, 108.86, 108.86, 108.86, 109.72, 110.58, 111.66, 112.74, 
        112.96, 113.61, 113.39, 112.53, 113.61, 113.39, 113.39, 114.25, 
        113.82, 114.25, 116.2, 116.63, 117.06, 118.36, 119.44, 118.79, 
        119, 119, 119, 120, 121, 123, 124, 124, 125, 127, 127, 127, 
        127, 128, 127, 128, 129, 130, 132, 133, 133, 134, 134, 134, 
        134, 135, 137, 138)), .Names = c("date", "cpi"), row.names = 4:220, class = "data.frame", na.action = structure(c(1L, 
    2L, 3L, 221L), .Names = c("1", "2", "3", "221"), class = "omit"))
    
-- 
Ajay Shah                                      http://www.mayin.org/ajayshah  
ajayshah using mayin.org                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.




More information about the R-SIG-Robust mailing list