[R] Tests for the need of cluster analysis

Ben Bolker bbolker at gmail.com
Mon May 2 21:51:20 CEST 2011


MARY A. WEISS <mweiss <at> temple.edu> writes:

> 
> Hi,
> 
> I am currently using STATA in my analysis.  STATA has a cluster option but
> does not have any tests for whether cluster analysis is necessary or not for
> a dataset.  So I am trying to figure out whether R could be used to test
> whether I need to be doing cluster analysis or not.  If R does tests to
> determine whether cluster analysis is valid for my data, I will learn R and
> use it on my data.
> 
> My data are panel data consisting of 49 states and 25 years.  Currently, I
> am estimating models with fixed state and time effects.
> 
> Thanks for any help you can give me.
> 
> Cheers,
> 
> Mary

  You might want to forward this question to the r-sig-mixed-models
list.   I think you are fairly far off base in comparing 'prabclus'
(spatial clustering) to what Stata means by "clustered standard errors"
(e.g. <http://www.stata.com/support/faqs/stat/cluster.html>).
Cluster _analysis_ has to do with finding clusters in data; prabclus
uses spatial information to do cluster analysis; robust cluster
variances or standard errors have to do with adjusting variance/SE
to account for predetermined grouping variables ("clusters" in the
data, e.g. states).

  I don't know offhand whether there are packages in R that implement
the "robust cluster variance" estimator; packages like geeglm,
geepack, and especially the "sandwich" package are definitely worth
looking at (they implement the equivalent of robust, but not robust
cluster [as far as I can tell], variance estimators]), as well as
the Econometrics Task View and the book "R for Stata Users" by
Muenchen and Hilbe.

  A final philosophical note: I don't think you should be
testing _based on your data_ whether robust or robust cluster
variance estimators are more appropriate; there's a fairly
dangerous data snooping issue here.  Rather, you should try to
decide _a priori_ based on your data what's most appropriate.

  Ben Bolker


> 
> On Mon, May 2, 2011 at 1:02 PM, Tal Galili <tal.galili <at> gmail.com> wrote:
> 
> > Hi Mary,
> > Are you using R for your other analysis?
> > If so, What commands are you using for your analysis?
> >
> > p.s: please keep the rest of the R-help mailing list in the loop.
> >
> > Cheers,
> > Tal
> >
> >
> >
[snip]

> >
> >
> >
> >
> [snip] MARY A. WEISS <mweiss <at> temple.edu> wrote:
> >
> >> Hi Tal,
> >>
> >> Thanks for your answer.  I am running models with two-way fixed effects
> >> and two-way fixed effects with a cluster option.  The results are very
> >> different.  I wanted to know if it is appropriate to cluster my data or
> >> not.  In looking through the R manual, 
> >> I thought that prabclus might help me
> >> answer the question.  Does prabclus include any tests that will tell me if
> >> cluster analysis is appropriate to use with my data?  That is, is cluster
> >> analysis valid for my data?
> >>
> >> Thanks in advance for any help you can give me.  I really appreciate it.
> >>
> >> Mary
> >>
[snip]
> >>
> >>> Hi Mary,
> >>> I'm not sure I understood your question.
> >>>
> >>> Are you using this package:
> >>> http://cran.r-project.org/web/packages/prabclus/index.html
> >>>  <http://cran.r-project.org/web/packages/prabclus/index.html>And asking
> >>> how to decide if to use it or not?
> >>>
> >>> ----------------Contact
> >>> Details:-------------------------------------------------------
> >>> Contact me: Tal.Galili <at> gmail.com |  972-52-7275845
> >>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> >>> www.r-statistics.com (English)
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sun, May 1, 2011 at 7:54 PM, mary weiss <mweiss <at> temple.edu> wrote:
> >>>
> >>>> Does R have the capability to perform tests for the need of clustering
> >>>> analysis (e.g., in prabclus)?  I am using panel data with two-way fixed
> >>>> effects but am unsure about whether I should be using a cluster option
> >>>> as
> >>>> well to estimate my model.--

> >>>>

[snip]



More information about the R-help mailing list