[R] by() and CrossTable()

Marc Schwartz (via MN) mschwartz at mn.rr.com
Tue Apr 25 18:22:40 CEST 2006

On Tue, 2006-04-25 at 11:07 -0400, Chuck Cleland wrote:
>    I am attempting to produce crosstabulations between two variables for 
> subgroups defined by a third factor variable.  I'm using by() and 
> CrossTable() in package gmodels.  I get the printing of the tables first 
> and then a printing of each level of the INDICES.  For example:
> library(gmodels)
> by(warpbreaks, warpbreaks$tension, function(x){CrossTable(x$wool, 
> x$breaks > 30, format="SPSS", fisher=TRUE)})
>    Is there a way to change this so that the CrossTable() output is 
> labeled by the levels of the INDICES variable?  I think this has to do 
> with how CrossTable returns output, because the following does what I want:
> by(warpbreaks, warpbreaks$tension, function(x){summary(lm(breaks ~ wool, 
> data = x))})
> thanks,
> Chuck


Thanks for your e-mail.

Without digging deeper, I suspect that the problem here is that
CrossTable() has embedded formatted output within the body of the
function using cat(), as opposed to a two step process of creating a
results object, which then has a print method associated with it. This
would be the case in the lm() example that you have as well as many
other functions in R.

I had not anticipated this particular use of CrossTable(), since it was
really focused on creating nicely formatted 2d tables using fixed width

That being said, I have had recent requests to enhance CrossTable()'s
functionality to:

1. Be able to assign the results of the internal processing to an object
and be able to assign that object without any other output. For example:

  Results <- CrossTable(...)

yielding no further output in the console.

2. Facilitate LaTeX markup of the CrossTable() formatted output for
inclusion in LaTeX documents.

Both of the above would require me to fundamentally alter CrossTable()
to create a "CrossTable" class object, as opposed to the current
embedded output. I would then create a print.CrossTable() method
yielding the current output, as well as one to create LaTeX markup for
that application. The LaTeX output would likely need to support the
regular 'table' style as well as 'ctable' and 'longtable' styles, the
latter given the potential for long multi-page output.

These changes should then support the type of use that you are
attempting here.

These are on my TODO list for CrossTable() (along with the inclusion of
the measures of association recently discussed) and now that the dust
has settled from some recent abstract submission deadlines I can get
back to some of these things. I don't have a timeline yet, but will
forge ahead with these enhancements.

One possible suggestion for you as an interim, at least in terms of some
nicely formatted n-way tables is the ctab() function in the 'catspec'
package by John Hendrickx.

A possible example call would be:

ctab(warpbreaks$tension, warpbreaks$wool, warpbreaks$breaks > 30, 
     type = c("n", "row", "column", "total"), addmargins = TRUE)

Unlike CrossTable() which is strictly 2d (though that may change in the
future), ctab() directly supports the creation of n-way tables, with
counts and percentages/proportions interleaved in the output. There are
no statistical tests applied and these would need to be done separately
using by().

Chuck, feel free to contact me offlist as other related issues may arise
or as you have other comments on this.

Again, thanks for the e-mail.

Best regards,

Marc Schwartz

More information about the R-help mailing list