[R-SIG-Finance] panel data in R
Richard Herron
richard.c.herron at gmail.com
Mon May 7 00:01:15 CEST 2012
I think the splitting is fairly expensive computationally (esp if you
have 14mm observations and 30k firms), so you probably want to
minimize the amount of splits and recombines that you do.
If you can wrap everything in one function (i.e., convert to xts/zoo
and do all transformations) that you can call from -ddply-, then the
two techniques should be roughly equivalent.
If you -ddply- over and over again, then you're better off creating a
list, then applying the functions to the list.
Richard Herron
On Sat, May 5, 2012 at 7:06 PM, Alexander Chernyakov
<alexander.chernyakov at gmail.com> wrote:
> Interesting, thank you! do you find it faster to do lists of xts objects or
> use plyr on dataframes (converting to a zoo, then doing what you want,
> converting back to a dataframe and returning which is what i currently do)?
>
> thanks
> alex
>
>
> On Sat, May 5, 2012 at 6:47 PM, Richard Herron <richard.c.herron at gmail.com>
> wrote:
>>
>> I think I would work with the daily data in lists of xts objects (or
>> one wide xts if only return series), but once I aggregated to the
>> month/year level I would use plm. I don't know of a width limit for
>> xts or data.frame, but I never go close to 30,000. I put each security
>> as an xts object in a list.
>>
>> I typically see people aggregate daily data to monthly data, requiring
>> that there are at least 15 or so daily observations (e.g., generating
>> idiosyncratic vol from a daily return series). This should generate a
>> (unbalanced) panel that you can feed to plm. You can specify formulas
>> with lags on the fly. If you try to lag a variable, but the lag isn't
>> there, plm drops the observation.
>>
>> To use the time series operators in plm estimators you just have to
>> properly format your data (either i,t in the first two columns or use
>> -plm.data-).
>>
>> Richard Herron
>>
>>
>> On Sat, May 5, 2012 at 3:14 PM, Alexander Chernyakov
>> <alexander.chernyakov at gmail.com> wrote:
>> > Sure. I will be using fixed effects for some things. I will mostly be
>> > running regressions (sometimes fixed effect but a lot of the time they
>> > will
>> > be fama-macbeth type) but the key thing I am looking for is the ability
>> > to
>> > lag things on the fly without having to run an apply statement to split
>> > everything up to a list and lag each firm individually, recombine the
>> > list
>> > and then run a regression.
>> >
>> > I currently use zoo but I was under the impression that there is some
>> > limit
>> > to the number of columns one can have, no? With 30k firms it might not
>> > be
>> > possible to have such a wide zoo object... am I incorrect about this?
>> >
>> >
>> > On Sat, May 5, 2012 at 3:08 PM, Richard Herron
>> > <richard.c.herron at gmail.com>
>> > wrote:
>> >>
>> >> What kind of models are you estimating? I would use PLM if I were
>> >> doing models with firm fixed effects (FE). But I don't think I see
>> >> firm FE with daily observations. I usually see firm FE at the annual
>> >> level.
>> >>
>> >> If you're either estimating time series models or aggregating daily
>> >> observations to the month-level for cross-sectional models, then a
>> >> list of firm-level time series would be best (or if you're only using
>> >> the return series you could put this in one wide xts or zoo object).
>> >>
>> >> Re: missing data. xts has -na.locf- for carrying forward the last
>> >> non-missing observation. I tend to leave missing observations as
>> >> missing.
>> >>
>> >> Could you provide an example of what you would like to estimate?
>> >>
>> >> Richard Herron
>> >>
>> >>
>> >> On Sat, May 5, 2012 at 11:30 AM, Alexander Chernyakov
>> >> <alexander.chernyakov at gmail.com> wrote:
>> >> > Hi Richard,
>> >> > Thanks for your response. One issue I have run into with PLM is it
>> >> > seems to be fairly slow with large data sets (14 mil date, firm
>> >> > points). Any tricks with this? Also, it seems to not handle
>> >> > irregularly spaced time points.. it fills in the missing ones with NA
>> >> > so when doing lagging or differencing things don't work correctly.
>> >> > Do
>> >> > you have any advice on fixing this?
>> >> >
>> >> > Thanks,
>> >> > Alex
>> >> >
>> >> > On Sat, May 5, 2012 at 8:43 AM, Richard Herron
>> >> > <richard.c.herron at gmail.com> wrote:
>> >> >> What kind of models do plan on using?
>> >> >>
>> >> >> If you plan on using time series models, then I suggest generating a
>> >> >> list where each entry is one firm. This will make it easy to fit
>> >> >> models with lapply.
>> >> >>
>> >> >> If you plan on using panel models, then I suggest using PLM. It is
>> >> >> easy enough to manually code within and between estimators, but if
>> >> >> you
>> >> >> use clustered standard errors or dynamic panel models, then PLM will
>> >> >> make you life a lot easier.
>> >> >>
>> >> >> Richard Herron
>> >> >>
>> >> >>
>> >> >> On Fri, May 4, 2012 at 6:30 PM, Alexander Chernyakov
>> >> >> <alexander.chernyakov at gmail.com> wrote:
>> >> >>>
>> >> >>> Hi,
>> >> >>> This question is of a general nature: How do people handle panel
>> >> >>> data
>> >> >>> in R? For example, I have returns of firms and each firm has
>> >> >>> daily
>> >> >>> observations. One way is to use the plm package.. another is to
>> >> >>> use
>> >> >>> plyr and just do the operations on (date, firmid) units using
>> >> >>> something like zoo as a container for each firm so that lagging and
>> >> >>> differencing can be done. For regression it seems that plm might
>> >> >>> be
>> >> >>> the better option? Just curious if somebody has a well worked out
>> >> >>> system for this.
>> >> >>>
>> >> >>> Thanks
>> >> >>> Alex
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> R-SIG-Finance at r-project.org mailing list
>> >> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> >> >>> -- Subscriber-posting only. If you want to post, subscribe first.
>> >> >>> -- Also note that this is not the r-help list where general R
>> >> >>> questions should go.
>> >
>> >
>
>
More information about the R-SIG-Finance
mailing list