Millo Giovanni
Thu Feb 4 15:10:30 CET 2010

Dear Liviu,

it's difficult to tell without seeing the data. I might guess that you have some completely empty groups about which Tapply complains when doing the time-demeaning, but it would be just a guess.

I realize you can't share the data in the present form, but may I suggest you try and subset your data in some random way, find a "problematic" subset (one which gives the error) then change labels and everything so that the data become unrecognizable, and send us that example? You can also randomly transform them, as this is likely to be a missing values issue.


Dear all
I am working on unbalanced panel data and I can readily fit a "pooling" model using plm(), but not a "within" or "random" model.
Reproducing the examples in vignette("plm") and in the AER package I encountered no such issues.

##unfortunately I cannot disclose the data, and it is too big anyway
> dim(ibes.kld.exp.p[x.subs , ])
[1] 13189    34
> summary(ibes.kld.exp.p[x.subs , ]$ibes1y.meanest)
total sum of squares : 28058
      id     time
0.752284 0.018656
> summary(ibes.kld.exp.p[x.subs , ]$employee_kld)
total sum of squares : 9146.5
      id     time
0.637098 0.073421

##fitting a pooling model works OK
> x <- plm(ibes1y.meanest ~ employee_kld, ibes.kld.exp.p[x.subs , ], 
> model="pooling")
> summary(x)
Oneway (individual) effect Pooling Model

plm(formula = ibes1y.meanest ~ employee_kld, data = ibes.kld.exp.p[x.subs,
    ], model = "pooling")

Unbalanced Panel: n=3041, T=1-16, N=13189

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max.
 -6.530  -0.871  -0.189   0.629  13.200

Coefficients :
             Estimate Std. Error t-value Pr(>|t|)
(Intercept)    1.5607     0.0127  122.73  < 2e-16 ***
employee_kld   0.1118     0.0152    7.35  2.2e-13 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    28100
Residual Sum of Squares: 27900
F-statistic: 53.954 on 1 and 13187 DF, p-value: 2.17e-13
> plmtest(x, "individual")

	Lagrange Multiplier Test - (Honda)

data:  ibes1y.meanest ~ employee_kld
normal = 1675.7, p-value < 2.2e-16
alternative hypothesis: significant effects

##fitting a within or random model fails
> x <- plm(ibes1y.meanest ~ employee_kld, ibes.kld.exp.p[x.subs , ], 
> model="within")
Error in Tapply.matrix(x, effect, mean, ...) : subscript out of bounds
> x <- plm(ibes1y.meanest ~ employee_kld, ibes.kld.exp.p[x.subs , ], 
> model="random")
Error in Tapply.matrix(x, effect, mean, ...) : subscript out of bounds

Would this be an issue with my data (which is a bit specific, since employee_kld is categorical)? Or perhaps there is an issue in plm() for unbalanced data?

Please let me know your opinion

> sessionInfo()
R version 2.10.1 (2009-12-14)

 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C

attached base packages:
 [1] tcltk     grid      splines   stats     graphics  grDevices utils
 [8] datasets  methods   base

other attached packages:
 [1] RcmdrPlugin.sos_0.2-0    tcltk2_1.1-1             RcmdrPlugin.Export_0.3-0
 [4] Hmisc_3.7-0              xtable_1.5-6             Rcmdr_1.5-5
 [7] car_1.2-16               ggplot2_0.8.5            digest_0.4.2
[10] reshape_0.8.3            plyr_0.1.9               proto_0.3-8
[13] plm_1.2-3                sandwich_2.2-5           zoo_1.6-2
[16] MASS_7.3-5               Formula_0.2-0            kinship_1.1.0-23
[19] lattice_0.18-3           nlme_3.1-96              survival_2.35-8
[22] fortunes_1.3-7           sos_1.2-4                brew_1.0-3
[25] hints_1.0.1-1

loaded via a namespace (and not attached):
[1] cluster_1.12.1 tools_2.10.1

