[R] merge.zoo returns unmatched dates

Achim Zeileis Achim.Zeileis at uibk.ac.at
Mon Oct 1 09:41:35 CEST 2012


On Mon, 1 Oct 2012, Vindoggy ! wrote:

>
> Sorry for the lack of reproducible data, but this seems to be a problem inherent to my dataset and I can't figure out where the issue is.
>
> I have several data frames set up as a time series with identical POSIXct date formats. If I keep the original data in data frame format and merge them using base merge- everything is perfect and everyone is happy.
>
> If I transform the data frames to zoo objects, and then do a merge.zoo- the data seem to become uncoupled from the original data. Even more unusual is that some dates in the new merged data set  are prior to the original data set. I've attempted bellow to show what this looks like, and I hope someone has a suggestion as to what may be causing the problem.
>
> Here is one set of data in data.frame format
>
> head(Vup)
>                 Date Velocity_m/s
> 1 2010-01-21 07:42:00     1.217943
> 2 2010-01-21 07:43:00     1.624395
> 3 2010-01-21 07:44:00     1.526379
> 4 2010-01-21 07:45:00     1.456831
> 5 2010-01-21 07:46:00     1.245390
> 6 2010-01-21 07:47:00     1.374330
>
> str(Vup)
> 'data.frame':    7168 obs. of  2 variables:
> $ Date        : POSIXct, format: "2010-01-21 07:42:00" "2010-01-21 07:43:00" ...
> $ Velocity_m/s: num  1.22 1.62 1.53 1.46 1.25 ...
>
> And here is a second in data.frame format:
>
> head(PAS)
>                 Date               PAS
> 1 2010-01-21 05:01:00   0.0013938
> 2 2010-01-21 05:02:00   0.0015331
> 3 2010-01-21 05:03:00   0.0016725
> 4 2010-01-21 05:04:00   0.0016725
> 5 2010-01-21 05:05:00   0.0012265
> 6 2010-01-21 05:06:00   0.0015889
>
> str(PAS)
> 'data.frame':    5520 obs. of  2 variables:
> $ Date       : POSIXct, format: "2010-01-21 05:01:00" "2010-01-21 05:02:00" ...
> $ PAS: num  0.00139 0.00153 0.00167 0.00167 0.00123 ...
>
>
>
> Using zoo:
>
> PASmin<-zoo(as.matrix(PAS[,2]),as.POSIXct(PAS[,1],format="%Y-%m-%d %H:%M:%S",tz="UTC"))
>
> str(PASmin)
> ?zoo? series from 2010-01-21 05:01:00 to 2010-01-27 13:01:00
>  Data: num [1:5520, 1] 0.00139 0.00153 0.00167 0.00167 0.00123 ...
> - attr(*, "dimnames")=List of 2
>  ..$ : NULL
>  ..$ : chr "PAS"
>  Index:  POSIXct[1:5520], format: "2010-01-21 05:01:00" "2010-01-21 05:02:00" "2010-01-21 05:03:00" ...
>
>
>
>
> ADP_UPmin<-zoo(as.matrix(Vup[,2]),as.POSIXct(Vup[,1], format="%Y-%m-%d %H:%M",tz="UTC"))
>
> str(ADP_UPmin)
> ?zoo? series from 2010-01-21 07:42:00 to 2010-01-26 20:12:00
>  Data: num [1:7168, 1] 1.22 1.62 1.53 1.46 1.25 ...
> - attr(*, "dimnames")=List of 2
>  ..$ : NULL
>  ..$ : chr "UP_Velocity_m/s"
>  Index:  POSIXct[1:7168], format: "2010-01-21 07:42:00" "2010-01-21 07:43:00" "2010-01-21 07:44:00" ...
>
>
> And if I merge the two zoo objects I get this:
>
> M<-merge(ADP_UPmin,PASmin)
> head(M)
>
>                    UP_Velocity_m/s       PAS
> 2010-01-20 21:01:00              NA 0.0013938
> 2010-01-20 21:02:00              NA 0.0015331
> 2010-01-20 21:03:00              NA 0.0016725
> 2010-01-20 21:04:00              NA 0.0016725
> 2010-01-20 21:05:00              NA 0.0012265
> 2010-01-20 21:06:00              NA 0.0015889
>
>
> ?zoo? series from 2010-01-20 21:01:00 to 2010-01-27 05:01:00
>  Data: num [1:8499, 1:2] NA NA NA NA NA NA NA NA NA NA ...
> - attr(*, "dimnames")=List of 2
>  ..$ : NULL
>  ..$ : chr [1:2] "UP_Velocity_m/s" "PAR"
>  Index:  POSIXct[1:8499], format: "2010-01-20 21:01:00" "2010-01-20 21:02:00" "2010-01-20 21:03:00" ...
>
>
> For some reason I can not figure out, even though both the PAS data frame and PAS zoo object starts at 2010-01-21 05:01:00, once merged the PAS data starts a day earlier at 2010-01-20 21:01:00.  The actual numeric data looks good, but both variables have no come uncoupled from the time series dates (The Velocity data is similarity uncoupled). And as stated before, doing an non-zoo merge on the data.frame data works fine.
>
> Anyone got any ideas what's going on?

My guess is that you create both zoo series with time zone UTC but that 
the TZ attribute gets lost upon the merge. Then, the time is displayed in 
your systems time zone (which you haven't told us) which apparently is a 
couple of hours before UTC.

On my system (which is in CET) I can create a series with UTC times

R> x <- zoo(1:2, as.POSIXct(c("2012-01-01 00:00:00",
+    "2012-01-01 01:00:00"), format = "%Y-%m-%d %H:%M:%S", tz = "UTC"))
R> x
2012-01-01 00:00:00 2012-01-01 01:00:00
                   1                   2

The times are in UTC as requested, but applying the c() method, they get 
dropped. See ?c.POSIXct.

R> time(x)
[1] "2012-01-01 00:00:00 UTC" "2012-01-01 01:00:00 UTC"
R> c(time(x))
[1] "2012-01-01 01:00:00 CET" "2012-01-01 02:00:00 CET"

Hence:

R> merge(x, x)
                     x x
2012-01-01 01:00:00 1 1
2012-01-01 02:00:00 2 2

But you can set the system time in your R session to UTC which gives the 
desired result:

R> Sys.setenv(TZ = "UTC")
R> merge(x, x)
                     x x
2012-01-01 00:00:00 1 1
2012-01-01 01:00:00 2 2

hth,
Z

>
>
> 	[[alternative HTML version deleted]]
>
>




More information about the R-help mailing list