[R] find data (date) gaps in time series
Marc Schwartz
marc_schwartz at me.com
Fri Nov 20 15:21:54 CET 2009
On Nov 20, 2009, at 8:04 AM, David Winsemius wrote:
>
> On Nov 20, 2009, at 6:26 AM, Stefan Strohmeier wrote:
>
>> Dear R users,
>>
>> I have a time series of precipitation data. The time series
>> comprises ~ 20 years and it is supposed to be constant (one value
>> per day), but due to some failure of the measuring device some days
>> or periods are missing. I would like to find these missing days or
>> periods just to get a first idea about the reliability of the
>> measurements. The only function I could find was is.constant(), but
>> of course I only get a true or false statement instead of the dates
>> missing.
>> Google searches and a look at the R help mailing did not reveal an
>> answer.
>>
>> Please find attached a few dates of the time series with missing
>> values from February to April. I would like R to detect those
>> missing dates.
>>
> > dtdta <- read.table(textConnection("2916 2002-02-17 0.0
> + 2917 2002-02-18 0.3
> + 2918 2002-02-19 3.8
> + 2919 2002-02-20 43.6
> + 2920 2002-02-21 1.0
> + 2921 2002-02-22 5.6
> + 2922 2002-02-23 10.6
> + 2923 2002-02-24 2.8
> + 2924 2002-02-25 19.1
> + 2925 2002-02-26 20.5
> + 2926 2002-03-06 0.0
> + 2927 2002-05-06 0.0
> + 2928 2002-05-07 0.0
> + 2929 2002-05-08 0.0
> + 2930 2002-05-09 0.0") )
>
> > dtdta[dtdta$V3 == 0, ]
>
> V1 V2 V3
> 1 2916 2002-02-17 0
> 11 2926 2002-03-06 0
> 12 2927 2002-05-06 0
> 13 2928 2002-05-07 0
> 14 2929 2002-05-08 0
> 15 2930 2002-05-09 0
>
> You seem to be using "0" as a missing marker. That's bad practice,
> but I suppose it's possble you cannot change how your instruments
> report. You should be using NA and the functions that support proper
> treatment of "missingness".
David,
I think that he is actually looking for dates where there is no
measurement as opposed to dates where the measurement is 0.
Thus:
> DF
V1 V2 V3
1 2916 2002-02-17 0.0
2 2917 2002-02-18 0.3
3 2918 2002-02-19 3.8
4 2919 2002-02-20 43.6
5 2920 2002-02-21 1.0
6 2921 2002-02-22 5.6
7 2922 2002-02-23 10.6
8 2923 2002-02-24 2.8
9 2924 2002-02-25 19.1
10 2925 2002-02-26 20.5
11 2926 2002-03-06 0.0
12 2927 2002-05-06 0.0
13 2928 2002-05-07 0.0
14 2929 2002-05-08 0.0
15 2930 2002-05-09 0.0
# Convert V2 to dates
# Default format is "%Y-%m-%d"
# See ?as.Date
DF$V2 <- as.Date(DF$V2)
# Get the range of dates covered
DateRange <- seq(min(DF$V2), max(DF$V2), by = 1)
# Get the dates in DateRange that are not in DF$V2
# See ?"%in%"
> DateRange[!DateRange %in% DF$V2]
[1] "2002-02-27" "2002-02-28" "2002-03-01" "2002-03-02" "2002-03-03"
[6] "2002-03-04" "2002-03-05" "2002-03-07" "2002-03-08" "2002-03-09"
[11] "2002-03-10" "2002-03-11" "2002-03-12" "2002-03-13" "2002-03-14"
[16] "2002-03-15" "2002-03-16" "2002-03-17" "2002-03-18" "2002-03-19"
[21] "2002-03-20" "2002-03-21" "2002-03-22" "2002-03-23" "2002-03-24"
[26] "2002-03-25" "2002-03-26" "2002-03-27" "2002-03-28" "2002-03-29"
[31] "2002-03-30" "2002-03-31" "2002-04-01" "2002-04-02" "2002-04-03"
[36] "2002-04-04" "2002-04-05" "2002-04-06" "2002-04-07" "2002-04-08"
[41] "2002-04-09" "2002-04-10" "2002-04-11" "2002-04-12" "2002-04-13"
[46] "2002-04-14" "2002-04-15" "2002-04-16" "2002-04-17" "2002-04-18"
[51] "2002-04-19" "2002-04-20" "2002-04-21" "2002-04-22" "2002-04-23"
[56] "2002-04-24" "2002-04-25" "2002-04-26" "2002-04-27" "2002-04-28"
[61] "2002-04-29" "2002-04-30" "2002-05-01" "2002-05-02" "2002-05-03"
[66] "2002-05-04" "2002-05-05"
HTH,
Marc Schwartz
More information about the R-help
mailing list