[R] Dataset Transformation
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Oct 11 16:08:16 CEST 2010
On Mon, Oct 11, 2010 at 9:35 AM, Santosh Srinivas
<santosh.srinivas at gmail.com> wrote:
> Repost .. since the previous msg had problems
>
> I need to transpose the following input dataset into an output dataset like
> below
>
> Input
> Date TICKER Price
> 11/10/2010 A 0.991642
> 11/10/2010 B 0.475023
> 11/10/2010 C 0.218642
> 11/10/2010 D 0.365135
> 12/10/2010 A 0.687873
> 12/10/2010 B 0.47006
> 12/10/2010 C 0.533542
> 12/10/2010 D 0.812439
> 13/10/2010 A 0.210848
> 13/10/2010 B 0.699799
> 13/10/2010 C 0.546003
> 13/10/2010 D 0.152316
>
> Output needed
>
> Date A B C D
> 11/10/2010 0.991642 0.475023 0.218642 0.365135
> 12/10/2010 0.687873 0.47006 0.533542 0.812439
> 13/10/2010 0.210848 0.699799 0.546003 0.152316
>
> I tried using the aggregate function but not quite getting the method.
>
1. Try this:
Lines <- " Date TICKER Price
11/10/2010 A 0.991642
11/10/2010 B 0.475023
11/10/2010 C 0.218642
11/10/2010 D 0.365135
12/10/2010 A 0.687873
12/10/2010 B 0.47006
12/10/2010 C 0.533542
12/10/2010 D 0.812439
13/10/2010 A 0.210848
13/10/2010 B 0.699799
13/10/2010 C 0.546003
13/10/2010 D 0.152316"
DF <- read.table(textConnection(Lines), header = TRUE)
DF$Date <- as.Date(DF$Date,"%d/%m/%Y")
DFout <- reshape(DF, dir = "wide", timevar = "TICKER", idvar = "Date")
names(DFout) <- sub("Price.", "", names(DFout))
2. or using read.zoo in the zoo package we can read it in and reshape
it all at once:
library(zoo)
z <- read.zoo(textConnection(Lines), header = TRUE,
split = 2, format = "%d/%m/%Y")
At this point z is a zoo object in wide format:
> z
A B C D
2010-10-11 0.991642 0.475023 0.218642 0.365135
2010-10-12 0.687873 0.470060 0.533542 0.812439
2010-10-13 0.210848 0.699799 0.546003 0.152316
Since this is a multivariate time series you might want to just leave
it as a zoo object since you then get all of the facilities of zoo,
e.g.
plot(z) # multi-panel
plot(z, screen = 1) # all in one panel
but if you want it as a data frame then convert it like this:
> data.frame(Index = index(z), coredata(z))
Index A B C D
1 2010-10-11 0.991642 0.475023 0.218642 0.365135
2 2010-10-12 0.687873 0.470060 0.533542 0.812439
3 2010-10-13 0.210848 0.699799 0.546003 0.152316
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list