[R-SIG-Finance] Return calculation for panel data structure

Jeffrey Ryan jeffrey.ryan at lemnica.com
Fri Mar 9 17:16:39 CET 2012


It should work, but you didn't provide the data that it is breaking on, so...

Hijacking the iticker column, we can make that our second identifier
to group with:

z <- rbind(z,transform(z,iticker=paste(iticker,2,sep="")))

#... some z to see what this is doing
#
#11     AA 26-Jun-03 15882 25.24     IXB
#12     AA 27-Jun-03 15883 25.09     IXB
#13     AA 30-Jun-03 15886 25.50     IXB
#14      A 20-Jun-03 15876 19.06    IXT2
#15      A 23-Jun-03 15879 18.89    IXT2
#16      A 24-Jun-03 15880 18.75    IXT2
#17      A 25-Jun-03 15881 18.92    IXT2
#...

And now to get return calculations by ticker, iticker do this:

> cbind(z,do.call(rbind,lapply(split(z,list(z$ticker,z$iticker)), function(x) Delt(x$prc))))

   ticker    date_f  date   prc iticker Delt.1.arithmetic
1       A 20-Jun-03 15876 19.06     IXT                NA
2       A 23-Jun-03 15879 18.89     IXT      -0.044588415
3       A 24-Jun-03 15880 18.75     IXT      -0.004387714
4       A 25-Jun-03 15881 18.92     IXT      -0.018028846
5       A 26-Jun-03 15882 19.35     IXT       0.029783762
6       A 27-Jun-03 15883 19.55     IXT      -0.005942948
7      AA 20-Jun-03 15876 26.24     IXB       0.016341172
8      AA 23-Jun-03 15879 25.07     IXB                NA
9      AA 24-Jun-03 15880 24.96     IXB      -0.008919203
10     AA 25-Jun-03 15881 24.51     IXB      -0.007411329
11     AA 26-Jun-03 15882 25.24     IXB       0.009066667
12     AA 27-Jun-03 15883 25.09     IXB       0.022727273
13     AA 30-Jun-03 15886 25.50     IXB       0.010335917
14      A 20-Jun-03 15876 19.06    IXT2                NA
15      A 23-Jun-03 15879 18.89    IXT2      -0.008919203
16      A 24-Jun-03 15880 18.75    IXT2      -0.007411329
17      A 25-Jun-03 15881 18.92    IXT2       0.009066667
18      A 26-Jun-03 15882 19.35    IXT2       0.022727273
19      A 27-Jun-03 15883 19.55    IXT2       0.010335917
20     AA 20-Jun-03 15876 26.24    IXB2                NA
21     AA 23-Jun-03 15879 25.07    IXB2      -0.044588415
22     AA 24-Jun-03 15880 24.96    IXB2      -0.004387714
23     AA 25-Jun-03 15881 24.51    IXB2      -0.018028846
24     AA 26-Jun-03 15882 25.24    IXB2       0.029783762
25     AA 27-Jun-03 15883 25.09    IXB2      -0.005942948
26     AA 30-Jun-03 15886 25.50    IXB2       0.016341172

If you have a missing value in one of your identifiers (a factor
here), you will get an error, but na.omit() called on 'z' will remove
it before processing. If you don't have the value I don't see how you
could properly group it anyway though.

As far as 'base R being too complicated for SAS users' ;-) ... I still
think you are much better off sticking as close to the core language
as you can - as that gives you the most understanding and flexibility.
 You could also functionalize the scary parts:

make_it_happen <- function(z) {
   require("quantmod")
   cbind(z,do.call(rbind,lapply(split(z,list(z$ticker,z$iticker)),
function(x) Delt(x$prc))))
}
make_it_happen(z)


Joking aside, the code is actually not bad if you take it apart as
Matthew did with his.

1) split(z, list(z$ticker, z$iticker))
returns a list of data.frames that are grouped by your identifiers
(factors in R-speak)

2) lapply uses each element in the list and applies the function(x) ... to it.

3) do.call(rbind takes the list returned by lapply (columns of
returns) and row binds them

4) cbind just attached the new column from (3) to the original 'z'

With that, we've diverged pretty hard from "finance" - so further
questions should probably be sent to R-help.

HTH
Jeff

On Thu, Mar 8, 2012 at 10:43 PM, Arsenio <arsenio.star at gmail.com> wrote:
> Jeffrey,
>
> Thanks a lot!
> I tried this code worked for the single grouping factor variable, now testing it with two which means apparently they have to cover the whole dataset, maybe NA in one of the variable not allowed by the ?split ?
>
> Warning message:
> In split.default(seq_len(nrow(x)), f, drop = drop, ...) :
>  data length is not a multiple of split variable
>
> I was looking for something simpler that would also help to understand the code for peers working in SAS, so base R would be a bit too complicated.
>
> Thanks for the Delt function that's quite a shortcut!
>
> Arsenio
>
>
>
> Wednesday, March 7, 2012, 8:14:31 AM, you wrote:
>
>> Built into R is ?split or ?aggregate and ?lapply which should serve you well.
>
>> Using base R is often the fastest way to skin the cat, if not the prettiest ;-)
>
>
>>> library(quantmod)   # for Delt
>>> cbind(z,do.call(rbind,lapply(split(z,z$ticker), function(x) Delt(x$prc))))
>
>>  ticker    date_f  date   prc iticker Delt.1.arithmetic
>> 1       A 20-Jun-03 15876 19.06     IXT                NA
>> 2       A 23-Jun-03 15879 18.89     IXT      -0.008919203
>> 3       A 24-Jun-03 15880 18.75     IXT      -0.007411329
>> 4       A 25-Jun-03 15881 18.92     IXT       0.009066667
>> 5       A 26-Jun-03 15882 19.35     IXT       0.022727273
>> 6       A 27-Jun-03 15883 19.55     IXT       0.010335917
>> 7      AA 20-Jun-03 15876 26.24     IXB                NA
>> 8      AA 23-Jun-03 15879 25.07     IXB      -0.044588415
>> 9      AA 24-Jun-03 15880 24.96     IXB      -0.004387714
>> 10     AA 25-Jun-03 15881 24.51     IXB      -0.018028846
>> 11     AA 26-Jun-03 15882 25.24     IXB       0.029783762
>> 12     AA 27-Jun-03 15883 25.09     IXB      -0.005942948
>> 13     AA 30-Jun-03 15886 25.50     IXB       0.016341172
>
>> See ?split for details on multiple identifiers ( just a list, but your
>> example didn't include it so I won't either)
>
>> The do.call(rbind/cbind stuff is just to get the original data.frame
>> back - there are more verbose ways to do the same thing.
>
>> Jeff
>
>
>> On Tue, Mar 6, 2012 at 7:38 PM, Matthew Clegg
>> <matthewcleggphd at gmail.com> wrote:
>>> Here is another approach that might work for you.
>
>>> apply_by_groups <- function (T, cols, F, ...) {
>>>  # On input, T is a dataframe, and cols is a list of column names.
>>>  # Groups the dataframe T into groups according to the specified
>>>  # column names, e.g., all rows in a given group will have identical
>>>  # values in the specified columns.  Within each group, applies the
>>>  # function F to the dataframe representing that group.  Returns the
>>>  # list of results.
>
>>>  NT <- T[do.call("order", lapply(cols, function(c) { T[,c]})),]
>>>  change <- function(k) { !all(NT[k-1,cols] == NT[k, cols]) }
>>>  starts <- which(c(TRUE, sapply(2:nrow(NT), change)))
>>>  ends <- c(starts[2:length(starts)]-1, nrow(NT))
>>>  lapply(1:length(starts), function(k) {
>>> F(NT[starts[k]:ends[k],,drop=FALSE], ...) })
>>> }
>>> ret <- function(X) { c(NA, X[2:length(X)]/X[1:(length(X)-1)]-1)}
>
>>>> do.call("rbind", apply_by_groups(z, "ticker", function(df) { df$ret <-
>>> ret(df$prc); df} ))
>>>   ticker    date_f  date   prc iticker          ret
>>> 1       A 20-Jun-03 15876 19.06     IXT           NA
>>> 2       A 23-Jun-03 15879 18.89     IXT -0.008919203
>>> 3       A 24-Jun-03 15880 18.75     IXT -0.007411329
>>> 4       A 25-Jun-03 15881 18.92     IXT  0.009066667
>>> 5       A 26-Jun-03 15882 19.35     IXT  0.022727273
>>> 6       A 27-Jun-03 15883 19.55     IXT  0.010335917
>>> 7      AA 20-Jun-03 15876 26.24     IXB           NA
>>> 8      AA 23-Jun-03 15879 25.07     IXB -0.044588415
>>> 9      AA 24-Jun-03 15880 24.96     IXB -0.004387714
>>> 10     AA 25-Jun-03 15881 24.51     IXB -0.018028846
>>> 11     AA 26-Jun-03 15882 25.24     IXB  0.029783762
>>> 12     AA 27-Jun-03 15883 25.09     IXB -0.005942948
>>> 13     AA 30-Jun-03 15886 25.50     IXB  0.016341172
>
>>> In the call to apply_by_groups, you can replace "ticker" with a list of
>>> names, e.g., c("ticker","fund","manager"), etc.
>
>>> Matthew Clegg
>
>>> On Tue, Mar 6, 2012 at 7:03 PM, Arsenio <arsenio.star at gmail.com> wrote:
>
>>>> Robert,
>
>>>> Thanks for the reply. I tried plm approach but apparently it only accepts
>>>> one individual id and one time id, because this fails
>
>>>> > zz=pdata.frame(z, c("Stock","Fund","Date"))
>>>> Error in match(x, table, nomatch = 0L) :
>>>>  'match' requires vector arguments
>
>>>> Any approach that accepts several individual observation identifiers? I
>>>> will take a look at the econometrics view on CRAN
>
>>>> Btw, sorry if the post came out mangled due to HTML, didn't check.
>
>>>> Thanks,
>
>>>> Arsenio
>
>>>> Tuesday, March 6, 2012, 5:24:35 AM, you wrote:
>
>
>>>> Use diff and lag from plm:
>
>>>> Lines="ticker date_f  date prc iticker
>>>> A 20-Jun-03  15876 19.06 IXT
>>>> A 23-Jun-03  15879 18.89 IXT
>>>> A 24-Jun-03  15880 18.75 IXT
>>>> A 25-Jun-03  15881 18.92 IXT
>>>> A 26-Jun-03  15882 19.35 IXT
>>>> A 27-Jun-03  15883 19.55 IXT
>>>> AA 20-Jun-03 15876 26.24 IXB
>>>> AA 23-Jun-03 15879 25.07 IXB
>>>> AA 24-Jun-03 15880 24.96 IXB
>>>> AA 25-Jun-03 15881 24.51 IXB
>>>> AA 26-Jun-03 15882 25.24 IXB
>>>> AA 27-Jun-03 15883 25.09 IXB
>>>> AA 30-Jun-03 15886 25.5 IXB
>>>> "
>>>> z <- read.table(textConnection(Lines), header = TRUE)
>>>> library(plm)
>>>> zz=pdata.frame(z, c("ticker","date"))
>>>> pr=zz$prc; str(pr)
>>>> zz$return=diff(pr)/lag(pr)
>>>> zz
>
>>>> Good luck!
>
>>>> Robert
>
>>>> From: Arsenio
>>>> Sent: Tuesday, March 06, 2012 2:47 AM
>>>> To: r-sig-finance at r-project.org
>>>> Subject: [R-SIG-Finance] Return calculation for panel data structure
>
>>>> Dear all,
>
>>>> I have a very simple question and i haven't been able to code it out. It's
>>>> a simple return calculation of a form:
>>>> R(t)=(P(t)-P(t-1))/P(t-1) or the same as ratio of prices minus one.
>>>> However it's in the panel data, where stocks belong to say a fund or some
>>>> other grouping variable.
>
>
>>>> ticker
>>>> date_f
>
>>>> date
>>>> prc
>>>> iticker
>>>> A
>>>> 20-Jun-03
>
>>>> 15876
>>>> 19.06
>>>> IXT
>>>> A
>>>> 23-Jun-03
>
>>>> 15879
>>>> 18.89
>>>> IXT
>>>> A
>>>> 24-Jun-03
>
>>>> 15880
>>>> 18.75
>>>> IXT
>>>> A
>>>> 25-Jun-03
>
>>>> 15881
>>>> 18.92
>>>> IXT
>>>> A
>>>> 26-Jun-03
>
>>>> 15882
>>>> 19.35
>>>> IXT
>>>> A
>>>> 27-Jun-03
>
>>>> 15883
>>>> 19.55
>>>> IXT
>>>> AA
>>>> 20-Jun-03
>
>>>> 15876
>>>> 26.24
>>>> IXB
>>>> AA
>>>> 23-Jun-03
>
>>>> 15879
>>>> 25.07
>>>> IXB
>>>> AA
>>>> 24-Jun-03
>
>>>> 15880
>>>> 24.96
>>>> IXB
>>>> AA
>>>> 25-Jun-03
>
>>>> 15881
>>>> 24.51
>>>> IXB
>>>> AA
>>>> 26-Jun-03
>
>>>> 15882
>>>> 25.24
>>>> IXB
>>>> AA
>>>> 27-Jun-03
>
>>>> 15883
>>>> 25.09
>>>> IXB
>>>> AA
>>>> 30-Jun-03
>
>>>> 15886
>>>> 25.5
>>>> IXB
>
>
>
>>>> In  SAS, I would do it this way:
>
>>>> data work.1; /* This Calcs the excluded index return for INDEX constructed
>>>> with PRICES (PR)!!!! */
>>>>       set work.1;
>>>>       by ticker iticker date_f ; /*sorting by stock, fund and date */
>
>>>>       lag_prc=lag(prc); /* creating the lagged price variable
>
>>>>       if first.iticker then /* if the it's a first date for the grouped
>>>> unit then.. */
>>>>               lag_prc= . ; /* setting the first lagged value to missing if
>>>> the first date for the grouped unit */
>>>>       return = (prc - lag_prc)/ lag_prc;
>>>>       drop lag_prc;
>>>> run;
>
>>>> The end result would be the same dataset with just one more column for
>>>> returns for each stock, fund, date combination with the first return for
>>>> the date set to missing.
>>>> I tried Return.calculate type of funcs from performance analytics and
>>>> other solutions using plyr or loops, but they are all too bulky.
>
>>>> Any ideas how to accomplish this? Would appreciate any hints,
>
>>>> Arsenio
>
>>>> _______________________________________________
>>>> R-SIG-Finance at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>>> -- Also note that this is not the r-help list where general R questions
>>>> should go.
>
>
>
>
>>>> --
>>>> Best regards,
>>>>  Arsenio                            mailto:arsenio.star at gmail.com
>
>>>> _______________________________________________
>>>> R-SIG-Finance at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>>> -- Also note that this is not the r-help list where general R questions
>>>> should go.
>
>
>
>
>>> --
>>> Matthew Clegg
>>> matthewcleggphd at gmail.com
>
>>>        [[alternative HTML version deleted]]
>
>>> _______________________________________________
>>> R-SIG-Finance at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>> -- Also note that this is not the r-help list where general R questions should go.
>
>
>
>
>
>
> --
> Best regards,
>  Arsenio                            mailto:arsenio.star at gmail.com
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.



-- 
Jeffrey Ryan
jeffrey.ryan at lemnica.com

www.lemnica.com
www.esotericR.com

R/Finance 2012: Applied Finance with R
www.RinFinance.com

See you in Chicago!!!!



More information about the R-SIG-Finance mailing list