[R-SIG-Finance] Return calculation for panel data structure

Fri Mar 9 05:43:19 CET 2012

Jeffrey,

Thanks a lot!
I tried this code worked for the single grouping factor variable, now testing it with two which means apparently they have to cover the whole dataset, maybe NA in one of the variable not allowed by the ?split ?

Warning message:
In split.default(seq_len(nrow(x)), f, drop = drop, ...) :
  data length is not a multiple of split variable

I was looking for something simpler that would also help to understand the code for peers working in SAS, so base R would be a bit too complicated. 

Thanks for the Delt function that's quite a shortcut!

Arsenio

Wednesday, March 7, 2012, 8:14:31 AM, you wrote:

> Built into R is ?split or ?aggregate and ?lapply which should serve you well.

> Using base R is often the fastest way to skin the cat, if not the prettiest ;-)

>> library(quantmod)   # for Delt
>> cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc))))

>  ticker    date_f  date   prc iticker Delt.1.arithmetic
> 1       A 20-Jun-03 15876 19.06     IXT                NA
> 2       A 23-Jun-03 15879 18.89     IXT      -0.008919203
> 3       A 24-Jun-03 15880 18.75     IXT      -0.007411329
> 4       A 25-Jun-03 15881 18.92     IXT       0.009066667
> 5       A 26-Jun-03 15882 19.35     IXT       0.022727273
> 6       A 27-Jun-03 15883 19.55     IXT       0.010335917
> 7      AA 20-Jun-03 15876 26.24     IXB                NA
> 8      AA 23-Jun-03 15879 25.07     IXB      -0.044588415
> 9      AA 24-Jun-03 15880 24.96     IXB      -0.004387714
> 10     AA 25-Jun-03 15881 24.51     IXB      -0.018028846
> 11     AA 26-Jun-03 15882 25.24     IXB       0.029783762
> 12     AA 27-Jun-03 15883 25.09     IXB      -0.005942948
> 13     AA 30-Jun-03 15886 25.50     IXB       0.016341172

> See ?split for details on multiple identifiers ( just a list, but your
> example didn't include it so I won't either)

> The do.call(rbind/cbind stuff is just to get the original data.frame
> back - there are more verbose ways to do the same thing.

> Jeff

> On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg
> <matthewcleggphd at gmail.com> wrote:
>> Here is another approach that might work for you.

>> apply_by_groups <- function (T, cols, F, ...) {
>>  # On input, T is a dataframe, and cols is a list of column names.
>>  # Groups the dataframe T into groups according to the specified
>>  # column names, e.g., all rows in a given group will have identical
>>  # values in the specified columns.  Within each group, applies the
>>  # function F to the dataframe representing that group.  Returns the
>>  # list of results.

>>  NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
>>  change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
>>  starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
>>  ends <- c(starts[2:length(starts)]-1, nrow(NT))
>>  lapply(1:length(starts), function(k) {
>> F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
>> }
>> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}

>>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <-
>> ret(df$prc); df} ))
>>   ticker    date_f  date   prc iticker          ret
>> 1       A 20-Jun-03 15876 19.06     IXT           NA
>> 2       A 23-Jun-03 15879 18.89     IXT -0.008919203
>> 3       A 24-Jun-03 15880 18.75     IXT -0.007411329
>> 4       A 25-Jun-03 15881 18.92     IXT  0.009066667
>> 5       A 26-Jun-03 15882 19.35     IXT  0.022727273
>> 6       A 27-Jun-03 15883 19.55     IXT  0.010335917
>> 7      AA 20-Jun-03 15876 26.24     IXB           NA
>> 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415
>> 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714
>> 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846
>> 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762
>> 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948
>> 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172

>> In the call to apply_by_groups, you can replace "ticker" with a list of
>> names, e.g., c("ticker","fund","manager"), etc.

>> Matthew Clegg

>> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <arsenio.star at gmail.com> wrote:

>>> Robert,

>>> Thanks for the reply. I tried plm approach but apparently it only accepts
>>> one individual id and one time id, because this fails

>>> > zz=pdata.frame(z, c("Stock","Fund","Date"))
>>> Error in match(x, table, nomatch = 0L) :
>>>  'match' requires vector arguments

>>> Any approach that accepts several individual observation identifiers? I
>>> will take a look at the econometrics view on CRAN

>>> Btw, sorry if the post came out mangled due to HTML, didn't check.

>>> Thanks,

>>> Arsenio

>>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote:

>>> Use diff and lag from plm:

>>> Lines="ticker date_f  date prc iticker
>>> A 20-Jun-03  15876 19.06 IXT
>>> A 23-Jun-03  15879 18.89 IXT
>>> A 24-Jun-03  15880 18.75 IXT
>>> A 25-Jun-03  15881 18.92 IXT
>>> A 26-Jun-03  15882 19.35 IXT
>>> A 27-Jun-03  15883 19.55 IXT
>>> AA 20-Jun-03 15876 26.24 IXB
>>> AA 23-Jun-03 15879 25.07 IXB
>>> AA 24-Jun-03 15880 24.96 IXB
>>> AA 25-Jun-03 15881 24.51 IXB
>>> AA 26-Jun-03 15882 25.24 IXB
>>> AA 27-Jun-03 15883 25.09 IXB
>>> AA 30-Jun-03 15886 25.5 IXB
>>> "
>>> z <- read.table(textConnection(Lines), header = TRUE)
>>> library(plm)
>>> zz=pdata.frame(z, c("ticker","date"))
>>> pr=zz$prc; str(pr)
>>> zz$return=diff(pr)/lag(pr)
>>> zz

>>> Good luck!

>>> Robert

>>> From: Arsenio
>>> Sent: Tuesday, March 06, 2012 2:47 AM
>>> To: r-sig-finance at r-project.org
>>> Subject: [R-SIG-Finance] Return calculation for panel data structure

>>> Dear all,

>>> I have a very simple question and i haven't been able to code it out. It's
>>> a simple return calculation of a form:
>>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
>>> However it's in the panel data, where stocks belong to say a fund or some
>>> other grouping variable.

>>> ticker
>>> date_f

>>> date
>>> prc
>>> iticker
>>> A
>>> 20-Jun-03

>>> 15876
>>> 19.06
>>> IXT
>>> A
>>> 23-Jun-03

>>> 15879
>>> 18.89
>>> IXT
>>> A
>>> 24-Jun-03

>>> 15880
>>> 18.75
>>> IXT
>>> A
>>> 25-Jun-03

>>> 15881
>>> 18.92
>>> IXT
>>> A
>>> 26-Jun-03

>>> 15882
>>> 19.35
>>> IXT
>>> A
>>> 27-Jun-03

>>> 15883
>>> 19.55
>>> IXT
>>> AA
>>> 20-Jun-03

>>> 15876
>>> 26.24
>>> IXB
>>> AA
>>> 23-Jun-03

>>> 15879
>>> 25.07
>>> IXB
>>> AA
>>> 24-Jun-03

>>> 15880
>>> 24.96
>>> IXB
>>> AA
>>> 25-Jun-03

>>> 15881
>>> 24.51
>>> IXB
>>> AA
>>> 26-Jun-03

>>> 15882
>>> 25.24
>>> IXB
>>> AA
>>> 27-Jun-03

>>> 15883
>>> 25.09
>>> IXB
>>> AA
>>> 30-Jun-03

>>> 15886
>>> 25.5
>>> IXB

>>> In  SAS, I would do it this way:

>>> data work.1; /* This Calcs the excluded index return for INDEX constructed
>>> with PRICES (PR)!!!! */
>>>       set work.1;
>>>       by ticker iticker date_f ; /*sorting by stock, fund and date */

>>>       lag_prc=lag(prc); /* creating the lagged price variable

>>>       if first.iticker then /* if the it's a first date for the grouped
>>> unit then.. */
>>>               lag_prc= . ; /* setting the first lagged value to missing if
>>> the first date for the grouped unit */
>>>       return = (prc - lag_prc)/ lag_prc;
>>>       drop lag_prc;
>>> run;

>>> The end result would be the same dataset with just one more column for
>>> returns for each stock, fund, date combination with the first return for
>>> the date set to missing.
>>> I tried Return.calculate type of funcs from performance analytics and
>>> other solutions using plyr or loops, but they are all too bulky.

>>> Any ideas how to accomplish this? Would appreciate any hints,

>>> Arsenio

>>> _______________________________________________
>>> R-SIG-Finance at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>> -- Also note that this is not the r-help list where general R questions
>>> should go.

>>> --
>>> Best regards,
>>>  Arsenio                            mailto:arsenio.star at gmail.com

>>> _______________________________________________
>>> R-SIG-Finance at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>> -- Also note that this is not the r-help list where general R questions
>>> should go.

>> --
>> Matthew Clegg
>> matthewcleggphd at gmail.com

>>        [[alternative HTML version deleted]]

>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions should go.

-- 
Best regards,
 Arsenio                            mailto:arsenio.star at gmail.com