[R-SIG-Finance] panel data in R

Sun May 6 00:47:52 CEST 2012

I think I would work with the daily data in lists of xts objects (or
one wide xts if only return series), but once I aggregated to the
month/year level I would use plm. I don't know of a width limit for
xts or data.frame, but I never go close to 30,000. I put each security
as an xts object in a list.

I typically see people aggregate daily data to monthly data, requiring
that there are at least 15 or so daily observations (e.g., generating
idiosyncratic vol from a daily return series). This should generate a
(unbalanced) panel that you can feed to plm. You can specify formulas
with lags on the fly. If you try to lag a variable, but the lag isn't
there, plm drops the observation.

To use the time series operators in plm estimators you just have to
properly format your data (either i,t in the first two columns or use
-plm.data-).

Richard Herron

On Sat, May 5, 2012 at 3:14 PM, Alexander Chernyakov
<alexander.chernyakov at gmail.com> wrote:
> Sure. I will be using fixed effects for some things.  I will mostly be
> running regressions (sometimes fixed effect but a lot of the time they will
> be fama-macbeth type) but the key thing I am looking for is the ability to
> lag things on the fly without having to run an apply statement to split
> everything up to a list and lag each firm individually, recombine the list
> and then run a regression.
>
> I currently use zoo but I was under the impression that there is some limit
> to the number of columns one can have, no?  With 30k firms it might not be
> possible to have such a wide zoo object... am I incorrect about this?
>
>
> On Sat, May 5, 2012 at 3:08 PM, Richard Herron <richard.c.herron at gmail.com>
> wrote:
>>
>> What kind of models are you estimating? I would use PLM if I were
>> doing models with firm fixed effects (FE). But I don't think I see
>> firm FE with daily observations. I usually see firm FE at the annual
>> level.
>>
>> If you're either estimating time series models or aggregating daily
>> observations to the month-level for cross-sectional models, then a
>> list of firm-level time series would be best (or if you're only using
>> the return series you could put this in one wide xts or zoo object).
>>
>> Re: missing data. xts has -na.locf- for carrying forward the last
>> non-missing observation. I tend to leave missing observations as
>> missing.
>>
>> Could you provide an example of what you would like to estimate?
>>
>> Richard Herron
>>
>>
>> On Sat, May 5, 2012 at 11:30 AM, Alexander Chernyakov
>> <alexander.chernyakov at gmail.com> wrote:
>> > Hi Richard,
>> > Thanks for your response.  One issue I have run into with PLM is it
>> > seems to be fairly slow with large data sets (14 mil date, firm
>> > points).  Any tricks with this? Also, it seems to not handle
>> > irregularly spaced time points.. it fills in the missing ones with NA
>> > so when doing lagging or differencing things don't work correctly.  Do
>> > you have any advice on fixing this?
>> >
>> > Thanks,
>> > Alex
>> >
>> > On Sat, May 5, 2012 at 8:43 AM, Richard Herron
>> > <richard.c.herron at gmail.com> wrote:
>> >> What kind of models do plan on using?
>> >>
>> >> If you plan on using time series models, then I suggest generating a
>> >> list where each entry is one firm. This will make it easy to fit
>> >> models with lapply.
>> >>
>> >> If you plan on using panel models, then I suggest using PLM. It is
>> >> easy enough to manually code within and between estimators, but if you
>> >> use clustered standard errors or dynamic panel models, then PLM will
>> >> make you life a lot easier.
>> >>
>> >> Richard Herron
>> >>
>> >>
>> >> On Fri, May 4, 2012 at 6:30 PM, Alexander Chernyakov
>> >> <alexander.chernyakov at gmail.com> wrote:
>> >>>
>> >>> Hi,
>> >>> This question is of a general nature: How do people handle panel data
>> >>> in R?  For example,  I have returns of firms and each firm has daily
>> >>> observations.  One way is to use the plm package.. another is to use
>> >>> plyr and just do the operations on (date, firmid) units using
>> >>> something like zoo as a container for each firm so that lagging and
>> >>> differencing can be done.  For regression it seems that plm might be
>> >>> the better option?  Just curious if somebody has a well worked out
>> >>> system for this.
>> >>>
>> >>> Thanks
>> >>> Alex
>> >>>
>> >>> _______________________________________________
>> >>> R-SIG-Finance at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> >>> -- Subscriber-posting only. If you want to post, subscribe first.
>> >>> -- Also note that this is not the r-help list where general R
>> >>> questions should go.
>
>