[R-SIG-Finance] Return calculation for panel data structure
Jeffrey Ryan
jeffrey.ryan at lemnica.com
Wed Mar 7 17:14:31 CET 2012
Built into R is ?split or ?aggregate and ?lapply which should serve you well.
Using base R is often the fastest way to skin the cat, if not the prettiest ;-)
> library(quantmod) # for Delt
> cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc))))
ticker date_f date prc iticker Delt.1.arithmetic
1 A 20-Jun-03 15876 19.06 IXT NA
2 A 23-Jun-03 15879 18.89 IXT -0.008919203
3 A 24-Jun-03 15880 18.75 IXT -0.007411329
4 A 25-Jun-03 15881 18.92 IXT 0.009066667
5 A 26-Jun-03 15882 19.35 IXT 0.022727273
6 A 27-Jun-03 15883 19.55 IXT 0.010335917
7 AA 20-Jun-03 15876 26.24 IXB NA
8 AA 23-Jun-03 15879 25.07 IXB -0.044588415
9 AA 24-Jun-03 15880 24.96 IXB -0.004387714
10 AA 25-Jun-03 15881 24.51 IXB -0.018028846
11 AA 26-Jun-03 15882 25.24 IXB 0.029783762
12 AA 27-Jun-03 15883 25.09 IXB -0.005942948
13 AA 30-Jun-03 15886 25.50 IXB 0.016341172
See ?split for details on multiple identifiers ( just a list, but your
example didn't include it so I won't either)
The do.call(rbind/cbind stuff is just to get the original data.frame
back - there are more verbose ways to do the same thing.
Jeff
On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg <matthewcleggphd at gmail.com> wrote:
> Here is another approach that might work for you.
>
> apply_by_groups <- function (T, cols, F, ...) {
> # On input, T is a dataframe, and cols is a list of column names.
> # Groups the dataframe T into groups according to the specified
> # column names, e.g., all rows in a given group will have identical
> # values in the specified columns. Within each group, applies the
> # function F to the dataframe representing that group. Returns the
> # list of results.
>
> NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
> change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
> starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
> ends <- c(starts[2:length(starts)]-1, nrow(NT))
> lapply(1:length(starts), function(k) {
> F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
> }
> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}
>
>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <-
> ret(df$prc); df} ))
> ticker date_f date prc iticker ret
> 1 A 20-Jun-03 15876 19.06 IXT NA
> 2 A 23-Jun-03 15879 18.89 IXT -0.008919203
> 3 A 24-Jun-03 15880 18.75 IXT -0.007411329
> 4 A 25-Jun-03 15881 18.92 IXT 0.009066667
> 5 A 26-Jun-03 15882 19.35 IXT 0.022727273
> 6 A 27-Jun-03 15883 19.55 IXT 0.010335917
> 7 AA 20-Jun-03 15876 26.24 IXB NA
> 8 AA 23-Jun-03 15879 25.07 IXB -0.044588415
> 9 AA 24-Jun-03 15880 24.96 IXB -0.004387714
> 10 AA 25-Jun-03 15881 24.51 IXB -0.018028846
> 11 AA 26-Jun-03 15882 25.24 IXB 0.029783762
> 12 AA 27-Jun-03 15883 25.09 IXB -0.005942948
> 13 AA 30-Jun-03 15886 25.50 IXB 0.016341172
>
> In the call to apply_by_groups, you can replace "ticker" with a list of
> names, e.g., c("ticker","fund","manager"), etc.
>
> Matthew Clegg
>
> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <arsenio.star at gmail.com> wrote:
>
>> Robert,
>>
>> Thanks for the reply. I tried plm approach but apparently it only accepts
>> one individual id and one time id, because this fails
>>
>> > zz=pdata.frame(z, c("Stock","Fund","Date"))
>> Error in match(x, table, nomatch = 0L) :
>> 'match' requires vector arguments
>>
>> Any approach that accepts several individual observation identifiers? I
>> will take a look at the econometrics view on CRAN
>>
>> Btw, sorry if the post came out mangled due to HTML, didn't check.
>>
>> Thanks,
>>
>> Arsenio
>>
>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote:
>>
>>
>> Use diff and lag from plm:
>>
>> Lines="ticker date_f date prc iticker
>> A 20-Jun-03 15876 19.06 IXT
>> A 23-Jun-03 15879 18.89 IXT
>> A 24-Jun-03 15880 18.75 IXT
>> A 25-Jun-03 15881 18.92 IXT
>> A 26-Jun-03 15882 19.35 IXT
>> A 27-Jun-03 15883 19.55 IXT
>> AA 20-Jun-03 15876 26.24 IXB
>> AA 23-Jun-03 15879 25.07 IXB
>> AA 24-Jun-03 15880 24.96 IXB
>> AA 25-Jun-03 15881 24.51 IXB
>> AA 26-Jun-03 15882 25.24 IXB
>> AA 27-Jun-03 15883 25.09 IXB
>> AA 30-Jun-03 15886 25.5 IXB
>> "
>> z <- read.table(textConnection(Lines), header = TRUE)
>> library(plm)
>> zz=pdata.frame(z, c("ticker","date"))
>> pr=zz$prc; str(pr)
>> zz$return=diff(pr)/lag(pr)
>> zz
>>
>> Good luck!
>>
>> Robert
>>
>> From: Arsenio
>> Sent: Tuesday, March 06, 2012 2:47 AM
>> To: r-sig-finance at r-project.org
>> Subject: [R-SIG-Finance] Return calculation for panel data structure
>>
>> Dear all,
>>
>> I have a very simple question and i haven't been able to code it out. It's
>> a simple return calculation of a form:
>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
>> However it's in the panel data, where stocks belong to say a fund or some
>> other grouping variable.
>>
>>
>> ticker
>> date_f
>>
>> date
>> prc
>> iticker
>> A
>> 20-Jun-03
>>
>> 15876
>> 19.06
>> IXT
>> A
>> 23-Jun-03
>>
>> 15879
>> 18.89
>> IXT
>> A
>> 24-Jun-03
>>
>> 15880
>> 18.75
>> IXT
>> A
>> 25-Jun-03
>>
>> 15881
>> 18.92
>> IXT
>> A
>> 26-Jun-03
>>
>> 15882
>> 19.35
>> IXT
>> A
>> 27-Jun-03
>>
>> 15883
>> 19.55
>> IXT
>> AA
>> 20-Jun-03
>>
>> 15876
>> 26.24
>> IXB
>> AA
>> 23-Jun-03
>>
>> 15879
>> 25.07
>> IXB
>> AA
>> 24-Jun-03
>>
>> 15880
>> 24.96
>> IXB
>> AA
>> 25-Jun-03
>>
>> 15881
>> 24.51
>> IXB
>> AA
>> 26-Jun-03
>>
>> 15882
>> 25.24
>> IXB
>> AA
>> 27-Jun-03
>>
>> 15883
>> 25.09
>> IXB
>> AA
>> 30-Jun-03
>>
>> 15886
>> 25.5
>> IXB
>>
>>
>>
>> In SAS, I would do it this way:
>>
>> data work.1; /* This Calcs the excluded index return for INDEX constructed
>> with PRICES (PR)!!!! */
>> set work.1;
>> by ticker iticker date_f ; /*sorting by stock, fund and date */
>>
>> lag_prc=lag(prc); /* creating the lagged price variable
>>
>> if first.iticker then /* if the it's a first date for the grouped
>> unit then.. */
>> lag_prc= . ; /* setting the first lagged value to missing if
>> the first date for the grouped unit */
>> return = (prc - lag_prc)/ lag_prc;
>> drop lag_prc;
>> run;
>>
>> The end result would be the same dataset with just one more column for
>> returns for each stock, fund, date combination with the first return for
>> the date set to missing.
>> I tried Return.calculate type of funcs from performance analytics and
>> other solutions using plyr or loops, but they are all too bulky.
>>
>> Any ideas how to accomplish this? Would appreciate any hints,
>>
>> Arsenio
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>>
>>
>>
>> --
>> Best regards,
>> Arsenio mailto:arsenio.star at gmail.com
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>
>
> --
> Matthew Clegg
> matthewcleggphd at gmail.com
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
--
Jeffrey Ryan
jeffrey.ryan at lemnica.com
www.lemnica.com
www.esotericR.com
R/Finance 2012: Applied Finance with R
www.RinFinance.com
See you in Chicago!!!!
More information about the R-SIG-Finance
mailing list