[R] Percentages in contingency tables *warning trivial question*
rachelpearce at msn.com
Mon Dec 13 10:37:19 CET 2004
I hesitate to post this question in the light of recent threads, indeed
I have hesitated for several weeks, however I have come to a full stop
and really need some help if I am going to progress. I am a new user of
R for medical statistics. I have attempted to read all the relevant
documents, but would welcome any suggestions as to what I have missed.
I am trying to contruct "table 1" type contingency (mostly) tables. I
would like to include percentages, thus:
Cases Controls Total
N % N % N %
Total 50 100 50 100 100 100
Sex: M 23 46 27 54 50 50
I hesitate even more to mention it here, but I am thinking of something
along the lines of PROC TABULATE in SAS.
The closest I have found in the documentation I have read so far is an
example given in the help for "addmargins":
Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE )
Sea <- sample( c("White","Black","Red","Dead"), 177,
# Weird function needed to return the N when computing
sqsm <- function( x ) sum( x )^2/100
B <- table(Sea, Bee)
round(sweep(addmargins(B, 1, list(list(All=sum, N=sqsm))), 2,
apply( B, 2, sum )/100, "/" ), 1)
round(sweep(addmargins(B, 2, list(list(All=sum, N=sqsm))), 1,
apply(B, 1, sum )/100, "/"), 1)
.. Which introduced me to "sweep" and maybe could be extended to do
what I want. But I don't like using mysterious "weird" functions.
I recently found Paul Johnson's Rtips where:
http://www.ku.edu/~pauljohn/R/Rtips.html#6.1 mentioned the function
prop.table, which is also close to what I want. But how to show Ns and
percentages im the same table?
I wondered if there were a function which does this already. Or perhaps
I should just write one for myself? Or should I not be trying to do this
in R in the first place and go back to Excel (I no longer have access to
SAS)? Please, NO! Or perhaps I am looking for the wrong thing in the
I have followed recent advice to look at Frank E Harrell's detailed
tabulation code, but this seems to produce many errors on my system and
with my version of R (see below). I do not have access to LaTeX
(apologies for incorrect typography). I can provide details of the
errors if it turns out that the answer to my question is RTFM by Prof
I would like to add my two pennorth to the debate about "trivial"
questions, of which I assume this is one. I believe that a very large
amount of what is hard about learning R on one's own with documentation
but without a real person, is a matter of vocabulary. I only found sweep
and prop.table by chance since neither of them are indexed by words like
"proportion" or "percentage" which is what I had been looking for.
Similarly I still do not know exactly what "sweep" does, since I have
never heard this verb used in a mathematical / statistical context, and
the help on sweep states that what it does is sweep. I have experienced
many similar examples in the last few weeks. This is not to say that
there is anything wrong with the help on these functions nor with the
help in general, but what R does not have is an extensive indexing
system by synonyms and uses. It is largely for reasons like this, I
believe, that trivial questions continue to be asked. If one does not
know the name of the function to do "verb" and one has tried "verb" and
the synonyms which spring to mind and drawn a blank, where to next?
Another reason for difficulty is that while a function may exist to do
something, it is sometimes hard to find the package where it is
contained, e.g. Frank Harrell's functions seem to be in a package called
Hmisc which is not listed in the drop-down box for "load package".
System and version information:
system i386, mingw32
British Society of Blood and Marrow Tranplantation
More information about the R-help