# [R] Percentages in contingency tables *warning trivial question*

BXC (Bendix Carstensen) bxc at steno.dk
Mon Dec 13 12:36:48 CET 2004

```> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Rachel Pearce
> Sent: Monday, December 13, 2004 10:37 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Percentages in contingency tables *warning
> trivial question*
>
>
> I hesitate to post this question in the light of recent
> threads, indeed I have hesitated for several weeks, however I
> have come to a full stop and really need some help if I am
> going to progress. I am a new user of R for medical
> statistics. I have attempted to read all the relevant
> documents, but would welcome any suggestions as to what I have missed.
>
> I am trying to contruct "table 1" type contingency (mostly)
> tables. I would like to include percentages, thus:
>
> 		Cases		Controls	Total
> 		N	%	N	%	N	%
> Total		50	100	50	100	100	100
>
>
> Sex: M	23 	46	27	54	50	50
>
> etc...
>
> I hesitate even more to mention it here, but I am thinking of
> something along the lines of PROC TABULATE in SAS.

This is one of the holes in the tabulation features in R.
The simplest feature needed in the one in addmargins, but
tabulation is still rudimentary in R.

I'm afraid that what you want would reqire:

1. Make the table of counts
2. Make the table of percentages by sweeping out a margin
( i.e. take the margin and divide the entire table by that,
- sweeping is just the generalization of this; use any
desired function instesd of "/" )
3. Define a new table with an extra dimension (c("N","pct")) and
fill in the two original tables there.

The last step is necessary in the absence of a generalized cbind/rbind
for tables/arrays.

Please correct me if such a thing exists. If it does, it should be

The weird example in addmargins only covers the case where a table of
percentages is wanted with a margin of total counts, not the general
problem.

Somebody should sit down a write a reasonable tabulation feature for R,
but the problem in itself is complcated, so the syntax is likely to be
arcane. For example, take a look at the syntax for proc tabulate in SAS,
which is very strange, but given the features it covers (which are all
desirable) it is difficult to come up with something simpler.

Bendix Carstensen
----------------------
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2
DK-2820 Gentofte
Denmark
tel: +45 44 43 87 38
mob: +45 30 75 87 38
fax: +45 44 43 07 06
bxc at steno.dk
www.biostat.ku.dk/~bxc
----------------------

> The closest I have found in the documentation I have read so
> far is an example given in the help for "addmargins":
>
> 	Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE )
> 	Sea <- sample( c("White","Black","Red","Dead"), 177,
> replace=TRUE )
> 	...
> 	# Weird function needed to return the N when computing
> percentages
> 	sqsm <- function( x ) sum( x )^2/100
> 	B <- table(Sea, Bee)
> 	round(sweep(addmargins(B, 1, list(list(All=sum, N=sqsm))), 2,
> 	apply( B, 2, sum )/100, "/" ), 1)
> 	round(sweep(addmargins(B, 2, list(list(All=sum, N=sqsm))), 1,
> 	apply(B, 1, sum )/100, "/"), 1)
>
> .. Which introduced me to "sweep" and maybe could be extended
> to do what I want. But I don't like using mysterious "weird"
> functions.
>
> I recently found Paul Johnson's Rtips where:
> http://www.ku.edu/~pauljohn/R/Rtips.html#6.1 mentioned the
> function prop.table, which is also close to what I want. But
> how to show Ns and percentages im the same table?
>
> I wondered if there were a function which does this already.
> Or perhaps I should just write one for myself? Or should I
> not be trying to do this in R in the first place and go back
> perhaps I am looking for the wrong thing in the manuals?
>
> I have followed recent advice to look at Frank E Harrell's
> detailed tabulation code, but this seems to produce many
> errors on my system and with my version of R (see below). I
> typography). I can provide details of the errors if it turns
> out that the answer to my question is RTFM by Prof Harrell.
>
> I would like to add my two pennorth to the debate about
> "trivial" questions, of which I assume this is one. I believe
> that a very large amount of what is hard about learning R on
> one's own with documentation but without a real person, is a
> matter of vocabulary. I only found sweep and prop.table by
> chance since neither of them are indexed by words like
> "proportion" or "percentage" which is what I had been looking
> for. Similarly I still do not know exactly what "sweep" does,
> since I have never heard this verb used in a mathematical /
> statistical context, and the help on sweep states that what
> it does is sweep. I have experienced many similar examples in
> the last few weeks. This is not to say that there is anything
> wrong with the help on these functions nor with the help in
> general, but what R does not have is an extensive indexing
> system by synonyms and uses. It is largely for reasons like
> this, I believe, that trivial questions continue to be asked.
> If one does not know the name of the function to do "verb"
> and one has tried "verb" and the synonyms which spring to
> mind and drawn a blank, where to next?
>
> Another reason for difficulty is that while a function may
> exist to do something, it is sometimes hard to find the
> package where it is contained, e.g. Frank Harrell's functions
> seem to be in a package called Hmisc which is not listed in
> the drop-down box for "load package".
>
> System and version information:
>
> platform i386-pc-mingw32
> arch     i386
> os       mingw32
> system   i386, mingw32
> status
> major    2
> minor    0.1
> year     2004
> month    11
> day      15
> language R
>
> Rachel Pearce
>
> British Society of Blood and Marrow Tranplantation
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help