[R] Contingency tables as data frames

presnell@stat.ufl.edu presnell at stat.ufl.edu
Wed Mar 1 10:25:00 CET 2000

       {again a message that was sent to  owner-r-help (which is me, currently)
        why on earth ???!??!?
	reply to R-help or the original sender Brett Presnell; }

I'm teaching a categorical data analysis course this term, and a minor
"problem" has resurfaced that I have often thought about before.  This
applies equally to Splus I suppose, but my undergrads aren't using

It seems natural to read/represent a contingency table as a data
frame, with one column representing the cell counts (as in the example
appended below (data taken from Agresti, "An Introduction to
Categorical Data Analysis").  However, functions like ftable,
mantelhaen.test, chisq.test, fisher.test, etc. don't work naturally
with this representation, and instead require the user to first
manipulate the data, say by using tapply to convert the data into an
array.  This is not difficult of course, but it's one of those things
that I'd rather not have to explain to students, who usually need to
be focusing on other things.

So, am I missing something obvious (not unlikely), or would it be a
good idea to extend the methods/arguments of these functions to
analyze/manipulate data represented in this way without any
preprocessing by the user?  It seems that a "count" (or "weight" or
"freq" or whatever) argument would do it in most cases.

Funny, I can't help but wonder if the answer from those who have thought
about this more deeply than I have might be "it's a can of worms".

Brett Presnell
Department of Statistics
University of Florida
(presnell at stat.ufl.edu)

     City Smoker Cancer Count
  Beijing    Yes    Yes   126
  Beijing    Yes     No   100
  Beijing     No    Yes    35
  Beijing     No     No    61
 Shanghai    Yes    Yes   908
 Shanghai    Yes     No   688
 Shanghai     No    Yes   497
 Shanghai     No     No   807
 Shenyang    Yes    Yes   913
 Shenyang    Yes     No   747
 Shenyang     No    Yes   336
 Shenyang     No     No   598
  Nanjing    Yes    Yes   235
  Nanjing    Yes     No   172
  Nanjing     No    Yes    58
  Nanjing     No     No   121
   Harbin    Yes    Yes   402
   Harbin    Yes     No   308
   Harbin     No    Yes   121
   Harbin     No     No   215
Zhengzhou    Yes    Yes   182
Zhengzhou    Yes     No   156
Zhengzhou     No    Yes    72
Zhengzhou     No     No    98
  Taiyuan    Yes    Yes    60
  Taiyuan    Yes     No    99
  Taiyuan     No    Yes    11
  Taiyuan     No     No    43
 Nanchang    Yes    Yes   104
 Nanchang    Yes     No    89
 Nanchang     No    Yes    21
 Nanchang     No     No    36
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list