[R] Data frame reordering to time series
Gabor Grothendieck
ggrothendieck at gmail.com
Sun Aug 8 02:04:32 CEST 2010
On Sat, Aug 7, 2010 at 4:49 PM, steven mosher <moshersteven at gmail.com> wrote:
> Given a data frame, or it could be a matrix if I choose to.
> The data consists of an ID, a year, and data for all 12 months.
> Missing values are a factor AND missing years.
>
> Id<-c(rep(67543,4),rep(12345,3),rep(89765,5))
> Years<-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
> Values2<-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
> Values<-c(12,14,34,21,54,65,23,12,13,13,13,14)
> Data<-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values,
> + Oct=Values,Nov=Values,Dec=Values2)
> Data
> Index Year Jan Feb Mar Apr Jun July Aug Sep Oct Nov Dec
> 1 67543 1989 12 6.0 12 12 12 4.000000 12 12 12 12 12
> 2 67543 1990 14 7.0 NA NA 14 4.666667 NA 14 14 14 NA
> 3 67543 1991 34 17.0 34 34 34 11.333333 34 34 34 34 34
> 4 67543 1992 21 10.5 21 21 21 7.000000 21 21 21 21 21
> 5 12345 1991 54 27.0 NA NA 54 18.000000 NA 54 54 54 NA
> 6 12345 1993 65 32.5 65 65 65 21.666667 65 65 65 65 65
> 7 12345 1994 23 11.5 23 23 23 7.666667 23 23 23 23 23
> 8 89765 1991 12 6.0 NA NA 12 4.000000 NA 12 12 12 NA
> 9 89765 1992 13 6.5 13 13 13 4.333333 13 13 13 13 13
> 10 89765 1993 13 6.5 NA NA 13 4.333333 NA 13 13 13 NA
> 11 89765 1994 13 6.5 13 13 13 4.333333 13 13 13 13 13
> 12 89765 1995 14 7.0 14 14 14 4.666667 14 14 14 14 14
>
>
> The Goal is to return a Time series object for each ID. Alternatively one
> could return a matrix that I can turn into a Time series.
> The final structure would be something like this ( done in matrix form for
> illustration)
> 1989.0 1989.083
> 1991 ......1992....1993..... 1994 .... 1995
> 67543 12 6.0 12 12 12 4.000000 12 12 12 12 12...
> .34...........21.. NA.........NA........NA
> 12345 NA, NA,
> NA,.............................................................54 27
>
> Basically the time series will have patches at the front, middle and end
> where you may have years of NA
> The must be column ordered by time and aligned so that averages for all
> series can be computed per month.
>
> Now I have looping code to do this, where I loop through all the IDs and map
> the row of data into the correct
> column. and create column names based on the data and row names based on the
> ID, but it's painfully
> slow. Any wizardry would help.
Your email came out a bit garbled so its not clear what you want to
get out but this code will produce a multivariate ts series, i.e. an
mts series, with one column for each series:
f <- function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1])
do.call(cbind, by(Data, Data$Index, f))
More information about the R-help
mailing list