[R-SIG-Finance] Return calculation for panel data structure

Wed Mar 7 17:14:31 CET 2012

Built into R is ?split or ?aggregate and ?lapply which should serve you well.

Using base R is often the fastest way to skin the cat, if not the prettiest ;-)

> library(quantmod)   # for Delt
> cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc))))

 ticker    date_f  date   prc iticker Delt.1.arithmetic
1       A 20-Jun-03 15876 19.06     IXT                NA
2       A 23-Jun-03 15879 18.89     IXT      -0.008919203
3       A 24-Jun-03 15880 18.75     IXT      -0.007411329
4       A 25-Jun-03 15881 18.92     IXT       0.009066667
5       A 26-Jun-03 15882 19.35     IXT       0.022727273
6       A 27-Jun-03 15883 19.55     IXT       0.010335917
7      AA 20-Jun-03 15876 26.24     IXB                NA
8      AA 23-Jun-03 15879 25.07     IXB      -0.044588415
9      AA 24-Jun-03 15880 24.96     IXB      -0.004387714
10     AA 25-Jun-03 15881 24.51     IXB      -0.018028846
11     AA 26-Jun-03 15882 25.24     IXB       0.029783762
12     AA 27-Jun-03 15883 25.09     IXB      -0.005942948
13     AA 30-Jun-03 15886 25.50     IXB       0.016341172

See ?split for details on multiple identifiers ( just a list, but your
example didn't include it so I won't either)

The do.call(rbind/cbind stuff is just to get the original data.frame
back - there are more verbose ways to do the same thing.

Jeff

On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg <matthewcleggphd at gmail.com> wrote:
> Here is another approach that might work for you.
>
> apply_by_groups <- function (T, cols, F, ...) {
>  # On input, T is a dataframe, and cols is a list of column names.
>  # Groups the dataframe T into groups according to the specified
>  # column names, e.g., all rows in a given group will have identical
>  # values in the specified columns.  Within each group, applies the
>  # function F to the dataframe representing that group.  Returns the
>  # list of results.
>
>  NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
>  change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
>  starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
>  ends <- c(starts[2:length(starts)]-1, nrow(NT))
>  lapply(1:length(starts), function(k) {
> F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
> }
> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}
>
>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <-
> ret(df$prc); df} ))
>   ticker    date_f  date   prc iticker          ret
> 1       A 20-Jun-03 15876 19.06     IXT           NA
> 2       A 23-Jun-03 15879 18.89     IXT -0.008919203
> 3       A 24-Jun-03 15880 18.75     IXT -0.007411329
> 4       A 25-Jun-03 15881 18.92     IXT  0.009066667
> 5       A 26-Jun-03 15882 19.35     IXT  0.022727273
> 6       A 27-Jun-03 15883 19.55     IXT  0.010335917
> 7      AA 20-Jun-03 15876 26.24     IXB           NA
> 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415
> 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714
> 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846
> 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762
> 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948
> 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172
>
> In the call to apply_by_groups, you can replace "ticker" with a list of
> names, e.g., c("ticker","fund","manager"), etc.
>
> Matthew Clegg
>
> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <arsenio.star at gmail.com> wrote:
>
>> Robert,
>>
>> Thanks for the reply. I tried plm approach but apparently it only accepts
>> one individual id and one time id, because this fails
>>
>> > zz=pdata.frame(z, c("Stock","Fund","Date"))
>> Error in match(x, table, nomatch = 0L) :
>>  'match' requires vector arguments
>>
>> Any approach that accepts several individual observation identifiers? I
>> will take a look at the econometrics view on CRAN
>>
>> Btw, sorry if the post came out mangled due to HTML, didn't check.
>>
>> Thanks,
>>
>> Arsenio
>>
>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote:
>>
>>
>> Use diff and lag from plm:
>>
>> Lines="ticker date_f  date prc iticker
>> A 20-Jun-03  15876 19.06 IXT
>> A 23-Jun-03  15879 18.89 IXT
>> A 24-Jun-03  15880 18.75 IXT
>> A 25-Jun-03  15881 18.92 IXT
>> A 26-Jun-03  15882 19.35 IXT
>> A 27-Jun-03  15883 19.55 IXT
>> AA 20-Jun-03 15876 26.24 IXB
>> AA 23-Jun-03 15879 25.07 IXB
>> AA 24-Jun-03 15880 24.96 IXB
>> AA 25-Jun-03 15881 24.51 IXB
>> AA 26-Jun-03 15882 25.24 IXB
>> AA 27-Jun-03 15883 25.09 IXB
>> AA 30-Jun-03 15886 25.5 IXB
>> "
>> z <- read.table(textConnection(Lines), header = TRUE)
>> library(plm)
>> zz=pdata.frame(z, c("ticker","date"))
>> pr=zz$prc; str(pr)
>> zz$return=diff(pr)/lag(pr)
>> zz
>>
>> Good luck!
>>
>> Robert
>>
>> From: Arsenio
>> Sent: Tuesday, March 06, 2012 2:47 AM
>> To: r-sig-finance at r-project.org
>> Subject: [R-SIG-Finance] Return calculation for panel data structure
>>
>> Dear all,
>>
>> I have a very simple question and i haven't been able to code it out. It's
>> a simple return calculation of a form:
>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
>> However it's in the panel data, where stocks belong to say a fund or some
>> other grouping variable.
>>
>>
>> ticker
>> date_f
>>
>> date
>> prc
>> iticker
>> A
>> 20-Jun-03
>>
>> 15876
>> 19.06
>> IXT
>> A
>> 23-Jun-03
>>
>> 15879
>> 18.89
>> IXT
>> A
>> 24-Jun-03
>>
>> 15880
>> 18.75
>> IXT
>> A
>> 25-Jun-03
>>
>> 15881
>> 18.92
>> IXT
>> A
>> 26-Jun-03
>>
>> 15882
>> 19.35
>> IXT
>> A
>> 27-Jun-03
>>
>> 15883
>> 19.55
>> IXT
>> AA
>> 20-Jun-03
>>
>> 15876
>> 26.24
>> IXB
>> AA
>> 23-Jun-03
>>
>> 15879
>> 25.07
>> IXB
>> AA
>> 24-Jun-03
>>
>> 15880
>> 24.96
>> IXB
>> AA
>> 25-Jun-03
>>
>> 15881
>> 24.51
>> IXB
>> AA
>> 26-Jun-03
>>
>> 15882
>> 25.24
>> IXB
>> AA
>> 27-Jun-03
>>
>> 15883
>> 25.09
>> IXB
>> AA
>> 30-Jun-03
>>
>> 15886
>> 25.5
>> IXB
>>
>>
>>
>> In  SAS, I would do it this way:
>>
>> data work.1; /* This Calcs the excluded index return for INDEX constructed
>> with PRICES (PR)!!!! */
>>       set work.1;
>>       by ticker iticker date_f ; /*sorting by stock, fund and date */
>>
>>       lag_prc=lag(prc); /* creating the lagged price variable
>>
>>       if first.iticker then /* if the it's a first date for the grouped
>> unit then.. */
>>               lag_prc= . ; /* setting the first lagged value to missing if
>> the first date for the grouped unit */
>>       return = (prc - lag_prc)/ lag_prc;
>>       drop lag_prc;
>> run;
>>
>> The end result would be the same dataset with just one more column for
>> returns for each stock, fund, date combination with the first return for
>> the date set to missing.
>> I tried Return.calculate type of funcs from performance analytics and
>> other solutions using plyr or loops, but they are all too bulky.
>>
>> Any ideas how to accomplish this? Would appreciate any hints,
>>
>> Arsenio
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>>
>>
>>
>> --
>> Best regards,
>>  Arsenio                            mailto:arsenio.star at gmail.com
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>
>
> --
> Matthew Clegg
> matthewcleggphd at gmail.com
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.

-- 
Jeffrey Ryan
jeffrey.ryan at lemnica.com

www.lemnica.com
www.esotericR.com

R/Finance 2012: Applied Finance with R
www.RinFinance.com

See you in Chicago!!!!