[R] Missing data?

R. Michael Weylandt michael.weylandt at gmail.com
Sun Nov 27 23:04:31 CET 2011


Back compatibility with other time series best I can tell, but to be
honest, I'm not even sure how it plays into that. Perhaps it's just an
artifact in the signature.

It doesn't seem to have a role in the xts constructor. E.g.,

identical(xts(1:5, Sys.Date()+1:5, frequency = 1), xts(1:5,
Sys.Date()+1:5, frequency = 3))

Michael

On Sun, Nov 27, 2011 at 4:51 PM, Kevin Burton <rkevinburton at charter.net> wrote:
> I was just trying to be complete. Why is the frequency argument and
> attribute available?
>
> -----Original Message-----
> From: R. Michael Weylandt [mailto:michael.weylandt at gmail.com]
> Sent: Saturday, November 26, 2011 2:40 PM
> To: Kevin Burton
> Cc: r-help at r-project.org
> Subject: Re: [R] Missing data?
>
> Why do you need to use a frequency attribute for these data? The point of
> the zoo/xts line of time series implementations is that the time stamps are
> carried through for each observation (unlike ts) and can be irregular. Both
> classes exist precisely to avoid being forced into a frequency attribute.
>
> As far as setting up the time elements, wouldn't this work? Change the start
> date to get weeks on any desired day
>
> d <- seq.Date(from = as.Date("2011-11-26"), by = -7, length.out = 100)
> xts(rep(NA, length(d)), d)
>
> You can avoid the OHLC formatting of to.weekly if you want with the OHLC =
> FALSE parameter. And if you want to index it by the first of the week rather
> htan the last, just try this:
>
> time(x) <- time(x) - 6
>
> Michael
>
> On Tue, Nov 22, 2011 at 6:50 PM, Kevin Burton <rkevinburton at charter.net>
> wrote:
>> Void of any other suggestions this approach makes sense but for my
>> case I think I need to use zoo objects rather than xts. If I sequence
>> the data generally I don't know if there will be 365 days in the year
>> or 366. So I have to sequence the dates as:
>>
>> seq(from=as.Date("2011-01-01"), to=as.Date("2011-12-31"), by="day")
>>
>> If I use this sequence with xts I get:
>>
>>> ds <- xts(NA, seq(from=as.Date("2011-01-01"),
>>> to=as.Date("2011-12-31"),
>> by="day"))
>> Error in xts(NA, seq(from = as.Date("2011-01-01"), to =
>> as.Date("2011-12-31"),  :
>>  NROW(x) must match length(order.by)
>>
>> If I leave the 'data' empty I don't get the error but if I try to
>> assign an individual item (fill as appropriate)
>>
>>> ds <- xts(, seq(from=as.Date("2011-01-01"), to=as.Date("2011-12-31"),
>> by="day"))
>>> ds["2011-12-24"] <- 10
>>> ds
>> Error in structure(coredata(x), names = x.attr$dimnames[[1]]) :
>>  'names' attribute [365] must be the same length as the vector [358]
>>
>> So now I need to remember that I have not filled in all of the data.
>> Also simple dereferencing gives:
>>
>>> ds[1]
>> Error in `[.xts`(ds, 1) : subscript out of bounds
>>
>> With zoo I am able to create a time-series where all of the data is
>> initially NA:
>>
>>> ds <- zoo(NA, seq(from=as.Date("2011-01-01"),
>>> to=as.Date("2011-12-31"),
>> by="day"))
>>
>> So I can fill the data as appropriate and the remaining slots will have
> NA.
>> I may be new with xts but I cannot see a way of creating a useable 'blank'
>> time-series.
>>
>> Also with xts it seems like the frequency is ignored.
>>
>>> ds <- xts(1:365, seq(from=as.Date("2011-01-01"),
>>> to=as.Date("2011-12-31"),
>> by="day"), frequency=52)
>>> frequency(ds)
>> [1] 1
>>
>> Whereas zoo remembers the frequency setting
>>
>>> ds <- zoo(1:365, seq(from=as.Date("2011-01-01"),
>>> to=as.Date("2011-12-31"),
>> by="day"), frequency=52)
>>> frequency(ds)
>> [1] 52
>>
>> But since the ultimate goal is to get the time-series in a 'ts' format
>> (as many functions require 'ts') it seems like even zoo has problems:
>>
>>> as.ts(ds)
>>
>> Time Series:
>> Start = c(14975, 1)
>> End = c(15339, 1)
>> Frequency = 52
>>    [1]   1  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>> NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>> NA  NA NA  NA  NA  NA  NA
>>   [42]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   2  NA  NA  NA  NA
>> NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>> NA  NA NA  NA  NA  NA  NA
>>   [83]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>> NA NA  NA  NA  NA  NA   3  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>> NA  NA NA  NA  NA  NA  NA
>>  [124]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>> NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   4
>> NA  NA NA  NA  NA  NA  NA
>>  [165]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>> NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>> NA  NA NA  NA  NA  NA  NA
>>  [206] . . . . . .
>>  So the conversion from zoo to ts maintained the frequency but I am
>> not sure where it decided on the start and end values. Also the
>> conversion seemed to changed the data also. Notice that every period
>> (52 entries) the original data is maintained. In other words if ds is
>> the original zoo time series then ds[1] is 1 and ds[2] is 2 etc. The
>> converted time-series keeps ds[1] but inserts 51 NA's then adds ds[2]
>> etc till the end of the series.  That is not what the initial data was.
> The conversion is inserting data of its own.
>>
>> The conversion to ts from xts seems better behaved:
>>
>> ds <- xts(1:365, seq(from=as.Date("2011-01-01"),
>> to=as.Date("2011-12-31"), by="day"), frequency=52)
>>> as.ts(ds)
>> Time Series:
>> Start = 1
>> End = 365
>> Frequency = 1
>>  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
>> 17
>> 18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
>> 36
>> 37  38  39  40  41  42
>>  [43]  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58
>> 59
>> 60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77
>> 78
>> 79  80  81  82  83  84
>>  [85]  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
>> 101
>> 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
>> 119 120
>> 121 122 123 124 125 126
>> [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
>> 143
>> 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
>> 161 162
>> 163 164 165 166 167 168
>> [169] 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
>> 185
>> 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202
>> 203 204
>> 205 206 207 208 209 210
>> [211] 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226
>> 227
>> 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244
>> 245 246
>> 247 248 249 250 251 252
>> [253] 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268
>> 269
>> 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286
>> 287 288
>> 289 290 291 292 293 294
>> [295] 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310
>> 311
>> 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328
>> 329 330
>> 331 332 333 334 335 336
>> [337] 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352
>> 353
>> 354 355 356 357 358 359 360 361 362 363 364 365
>>
>> But alas the frequency is ignored.
>>
>> So this is what I have found out using these two packages.  If I want
>> to create a 'blank' data set it seems like zoo is 'better' since I can
>> create a time-series initialized with NA irrespective of the length of the
> series.
>> However I must be unfamiliar with the conversion because zoo doesn't
>> convert to a regular 'ts' very well.  But  zoo remembers the frequency
>> setting whereas xts just ignores it.
>>
>> It seems like there is still considerable work to solve the original
>> problem. If I create a time series and fill in the values that are
>> appropriate I still could have NA in the series it seems to.weekly has
>> a problem with NA in the time series:
>>> ds <- xts(rep(NA,365), seq(from=as.Date("2011-01-01"),
>> to=as.Date("2011-12-31"), by="day"), frequency=52)
>>> to.weekly(ds, sum)
>> Error in if (drop.time) x <- .drop.time(x) :
>>  argument is not interpretable as logical In addition: Warning
>> message:
>> In to.period(x, "weeks", name = name, ...) :
>>  missing values removed from data
>>
>>
>> -----Original Message-----
>> From: R. Michael Weylandt <michael.weylandt at gmail.com>
>> [mailto:michael.weylandt at gmail.com]
>> Sent: Tuesday, November 22, 2011 3:10 PM
>> To: Kevin Burton
>> Cc: <r-help at r-project.org>
>> Subject: Re: [R] Missing data?
>>
>> Couldn't you use seq.Date() to set up the time index and then just
>> fill as appropriate?
>>
>> Alternatively, to.weekly if you are starting with a daily series.
>>
>> Michael
>>
>> On Nov 22, 2011, at 4:00 PM, "Kevin Burton" <rkevinburton at charter.net>
>> wrote:
>>
>>> I was wondering what the best approach is for missing data in a time
>> series.
>>> I give an example using xts but I would like to know what seems to be
>>> the "best" method. Say I have
>>>
>>>
>>>
>>> library(xts)
>>>
>>> xts.ts <- xts(1:4,as.Date(c("1970-01-01", "1970-1-3", "1980-10-10",
>>> "2007-8-19")), frequency=52)
>>>
>>>
>>>
>>> I would like to turn this into a time series (still could be xts, or
>>> converted to ts) that has values for every week starting with the
>>> week that includes the start date and ending with the week that
>>> includes the
>> end date.
>>> If there is data for the week then use it otherwise set it to NA or 0.
>>> Remember some years have 52, 53, or rarely 54 full or partial weeks.
>>> What to do with the partials at the beginning and ending of the year?
>>> This seems to be a fairly common problem and doing it myself is very
>>> cumbersome. Does a solution to this kind of problem exist? Once the
>>> approach to a weekly period is found I am sure that adjustment to
>>> daily, monthly, or quarterly would be relatively straightforward.
>>>
>>>
>>>
>>> Thank you.
>>>
>>>
>>>
>>> Kevin
>>>
>>>
>>>
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>



More information about the R-help mailing list