[R] plm "within" models: is the correct F-statistic reported?
Liviu Andronic
landronimirc at gmail.com
Thu Mar 18 01:06:32 CET 2010
On 3/17/10, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:
> Hmm, that sounds strange. Maybe something about the data pre-processing
> went wrong?
>
I traced plm() in step-by-step mode, and the process stalls on
plm.fit(), apparently after all the pre-processing.
> Depending on how unbalanced the data is, there might not be
> enough observations.
>
It is very unbalanced.
> length(unique(kldall.sync$cusip6[])) ##nr of individuals
[1] 3079
> summary(x1); sum(x1==1) ##distribution of T
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 2.00 4.00 4.32 5.00 16.00
[1] 560
But this doesn't seem to be an issue for a heavily unbalanced Grunfeld:
> data("Grunfeld", package = "AER")
> gr <- subset(Grunfeld[-c(1:19), ], firm %in% c("General Electric", "General Motors", "IBM"))
> dim(gr); head(gr)
[1] 41 5
invest value capital firm year
20 1486.7 5593.6 2226.3 General Motors 1954
41 33.1 1170.6 97.8 General Electric 1935
42 45.0 2015.8 104.4 General Electric 1936
43 77.2 2803.3 118.0 General Electric 1937
44 44.6 2039.7 156.2 General Electric 1938
45 48.1 2256.2 172.6 General Electric 1939
> pgr <- plm.data(gr, index = c("firm", "year"))
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within",
+ effect = "individual")
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within",
+ effect = "time")
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within",
+ effect = "twoways")
> summary(gr_fe)
Twoways effects Within Model
Call:
plm(formula = invest ~ value + capital, data = pgr, effect = "twoways",
model = "within")
Unbalanced Panel: n=3, T=1-20, N=41
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-3.01e+01 -7.64e+00 -3.60e-15 7.64e+00 3.01e+01
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
value 0.0167 0.0165 1.01 0.32
capital 0.0468 0.0367 1.27 0.22
Total Sum of Squares: 6850
Residual Sum of Squares: 6150
F-statistic: 0.959136 on 2 and 17 DF, p-value: 0.403
> Does the lm() version of the "twoways" model work ok?
>
It works when controlling only for time, but doesn't work when
controlling only for individuals or for both (I actually kill the
process after 10-15 min). I guess that here the following applies:
"in cases where there are many individuals in the sample and we are
not interested in the value of their fixed effects, the lm() results
are awkward to deal with and the estimation of a large number of ui
coefficients could render the problem numerically intractable." [1]
[1] http://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf
The alternative to dummy controls is time-demeaning of the data, and
according to the "plm" vignette this is the implementation for
"individual" and "time" cases. I am wondering, though, if "twoways"
uses (or can use?) the same implementation.
One way to work around the failing lm() "twoways" call is to use
plm(..., effect="individual") and manually include the time effect.
> gr_fe2 <- plm(invest ~ value + capital + year, data = pgr,
+ model = "within", effect="individual")
> summary(gr_fe2)
Oneway (individual) effect Within Model
Call:
plm(formula = invest ~ value + capital + year, data = pgr, effect =
"individual",
model = "within")
Unbalanced Panel: n=3, T=1-20, N=41
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-3.01e+01 -7.64e+00 -4.06e-15 7.64e+00 3.01e+01
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
value 0.0167 0.0165 1.01 0.325
capital 0.0468 0.0367 1.27 0.220
year1936 1.2031 20.3416 0.06 0.954
[..]
year1954 92.5546 37.0794 2.50 0.023 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 68100
Residual Sum of Squares: 6150
F-statistic: 8.14666 on 21 and 17 DF, p-value: 0.0000286
> If so, I guess you will have to try to find out whether it's your
> preparation of the data or the fault of plm() that it does not work.
>
This mix-up replicates fine the coefficients of plm(...,
effect="twoways"), but reports an unexpected F-statistic. Since the
mixed-up specification---plm(..., effect="individual") & year
regressor---works fine on my data, I would suspect that my data is OK
and that the plm(..., effect="twoways") implementation falters
somewhere.
> And you want fixed effects for all >2000 individuals?
>
No, I don't think so. I am not sure how orthodox this is, but we are
only looking at the coefficients of the "main" regressors, while
controlling for time and individual variation.
> waldtest() from "lmtest" does work in this context. Furthermore, "plm"
> provides various specialized tests for certain test problems.
>
Finally, it works!
> gr_fe1 <- plm(invest ~ value + capital, data = pgr,
+ model = "within", effect="twoways")
> summary(gr_fe1)$fstatistic
F test
data: invest ~ value + capital
F = 0.9591, df1 = 2, df2 = 17, p-value = 0.403
> gr_fe2 <- plm(invest ~ value + capital + year, data = pgr,
+ model = "within", effect="individual")
> summary(gr_fe2)$fstatistic ##"incorrect"
F test
data: invest ~ value + capital + year
F = 8.1467, df1 = 21, df2 = 17, p-value = 0.00002857
> gr_fe2_null <- plm(invest ~ year, data = pgr, model = "within")
> waldtest(gr_fe2_null, gr_fe2, test="F") ##works!
Wald test
Model 1: invest ~ year
Model 2: invest ~ value + capital + year
Res.Df Df F Pr(>F)
1 19
2 17 2 0.96 0.4
Thanks again
Liviu
More information about the R-help
mailing list