[R] finding and describing missing data runs in a time series
R. Michael Weylandt <michael.weylandt@gmail.com>
michael.weylandt at gmail.com
Mon Feb 13 03:40:31 CET 2012
Not at a computer to test this but perhaps
rle(is.na(x))
might help.
Michael
On Feb 12, 2012, at 7:36 PM, "Durant, James T. (ATSDR/DTEM/PRMSB)" <hzd3 at cdc.gov> wrote:
> Hi -
>
> I am trying to find and describe missing data in a time series. For instance, in the library openair, there is a data frame called "mydata":
> library(openair)
> head(mydata)
>
> date ws wd nox no2 o3 pm10 so2 co pm25
> 1 1998-01-01 00:00:00 0.60 280 285 39 1 29 4.7225 3.3725 NA
> 2 1998-01-01 01:00:00 2.16 230 NA NA NA 37 NA NA NA
> 3 1998-01-01 02:00:00 2.76 190 NA NA 3 34 6.8300 9.6025 NA
> 4 1998-01-01 03:00:00 2.16 170 493 52 3 35 7.6625 10.2175 NA
> 5 1998-01-01 04:00:00 2.40 180 468 78 2 34 8.0700 8.9125 NA
> 6 1998-01-01 05:00:00 3.00 190 264 42 0 16 5.5050 3.0525 NA
>
>
> So for example, I would like to be able to detect for pm25, I would like to be able to detect that there are NA's starting at 1998-01-01 0:00:00 and runs for 2887 hourly observations. Then I would be able to know that there is an NA at 2910 and so on. The key information I am looking for is when the NA's start and their length. The closest thing I can use that I know about is timePlot in the openair package with statistic="frequency" but it only gives monthly summary data, and does not tell me if the missing data are clumped together or are dispersed.
>
> VR
>
> Jim
>
>
> James T. Durant, MSPH CIH
> Emergency Response Coordinator
> US Agency for Toxic Substances and Disease Registry
> Atlanta, GA 30341
> 770-378-1695
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list