[R] contingency tables in R

Mon Apr 16 10:57:19 CEST 2001

>>>>> Patrick Ball writes:

> Dear List:

> Most of the analysis I do involves contingency tables.  I am migrating
> to R from Stata and I have a number of questions about using
> contingency tables in R.  I suspect that most of the things I want to
> do are very short R scripts that people on this list probably have.  I
> wonder if you would be willing to share them.

> First, the presentation of tables by table() is not analysis-ready.
> Is there a way to output the table with the marginals, by cell, row or
> column proportions, with the test statistics (especially the chi^2 and
> the log-likelihood chi^2), residuals, cross product, and odds ratio?

Not in one monolithic function, I think, and I am not sure I would like
to have such a thing, see below.  But the pieces are all there:

* Use margin.table() and prop.table() to obtain margins and proportions,
  respectively.

* Use chisq.test() [in package ctest] for the chisq analysis (test
  statistic, p-value, chisq residuals)

* Use loglin() for the LR chisq and residuals.

* Not sure about which odds ratios you want.  Function mantelhaen.test()
  in package ctest does exact conditional ones for 2 by 2 tables.

It really depends on how your data is set up.  If you have the raw
values in a data frame, I would actually recommend using xtabs() rather
than table().  Try e.g.

     data(esoph)
     x <- xtabs(cbind(ncases, ncontrols) ~ ., data = esoph)
     x
     summary(x)

the last one prints ``useful'' summary information.

To obtain pretty-printed output from multi-way tables, use ftable().

> I also like to make tables that have summary statistics of a given
> variable in the columns (mean, s.d., etc.)  with each row being the
> value for a sub group of the data.  How do you do this in R?

Use aggregate().

> The most complicated piece of this is contingency tables done with
> sample data.  The sampling involves several strata with different
> sampling weights.  Calculating the cell (or row or column)
> probabilities is relatively easy, but the other statistics can be
> complicated (the design effect, the finite population correction, the
> various chi^2s, and the standard errors and confidence
> intervals). Also, I sometimes make these tables with summary
> statistics in place of counts or population proportions.

> Is there any way to do this stuff in R without hacking it all myself?

The pieces are all there, I think, and it should be fairly simple to
combine them to reflect your personal preferences for displaying
categorical information etc.

-k
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._