[R] Apply or Tapply to Build Set of Tables
Dennis Murphy
djmuser at gmail.com
Tue May 24 05:03:09 CEST 2011
Hi:
Here's one way to do the pairwise tables. I'm restricting attention to
the variables with only a few levels, but the idea should be clear
enough.
# Put the variable names into a vector
vars <- names(infert)[c(1, 3:6)]
# Use expand.grid() to generate all pairs of variables
# It's important to keep these as character strings
tvars <- expand.grid(x = vars[-length(vars)],
y = vars[-1], stringsAsFactors = FALSE)
library(plyr)
# Function to return the value of the chi-square statistic
# and its p-value. Inputs x and y are character strings of variable names
# get(x) and get(y) pull in the data associated with those variables
x2fun <- function(x, y) {
res <- with(infert, chisq.test(get(x), get(y)))
data.frame(stat = res$statistic, pval = res$p.value)
}
# Apply the function to each row of tvars, outputting
# a data frame
mdply(tvars, x2fun)
> mdply(tvars, x2fun)
x y stat pval
1 education parity 1.058180e+02 3.708990e-18
2 parity parity 1.240000e+03 5.270109e-246
3 induced parity 5.969764e+01 4.134443e-09
4 case parity 6.036266e-02 9.999534e-01
5 education induced 1.653059e+01 2.383898e-03
6 parity induced 5.969764e+01 4.134443e-09
7 induced induced 4.960000e+02 4.910976e-106
8 case induced 7.322983e-02 9.640473e-01
9 education case 2.289618e-03 9.988558e-01
10 parity case 6.036266e-02 9.999534e-01
11 induced case 7.322983e-02 9.640473e-01
12 case case 2.435293e+02 6.686097e-55
13 education spontaneous 3.626057e+00 4.589717e-01
14 parity spontaneous 5.083091e+01 1.876475e-07
15 induced spontaneous 1.819802e+01 1.128831e-03
16 case spontaneous 3.286172e+01 7.314205e-08
Warning messages:
1: In chisq.test(get(x), get(y)) :
Chi-squared approximation may be incorrect
2: In chisq.test(get(x), get(y)) :
Chi-squared approximation may be incorrect
3: In chisq.test(get(x), get(y)) :
Chi-squared approximation may be incorrect
4: In chisq.test(get(x), get(y)) :
Chi-squared approximation may be incorrect
The warnings have to do with cell sizes < 5 in the bivariate tables.
Also watch out for Simpson's paradox :)
HTH,
Dennis
On Mon, May 23, 2011 at 5:31 PM, Sparks, John James <jspark4 at uic.edu> wrote:
> Dear R Helpers,
>
> First, I apologize for asking for help on the first of my topics. I have
> been looking at the posts and pages for apply, tapply etc, and I know that
> the solution to this must be ridiculously easy, but I just can't seem to
> get my brain around it. If I want to produce a set of tables for all the
> variables in my data, how can I do that without having to type them into
> the table command one by one. So, I would like to use (t? s? r?)apply to
> use one command instead of the following set of table commands:
>
> data(infert, package = "datasets")
> attach(infert)
>
> table.education<-table(education)
> table.age<-table(age)
> table.parity<-table(parity)
> etc.
>
>
> To make matters worse, what I subsequently need is the chi-square for each
> and all of the pairs of variables. Such as:
>
> chi.education.age<-chisq.test(table(education,age))
> chi.education.parity<-chisq.test(table(education,parity))
> chi.age.parity<-chisq.test(table(age,parity))
> etc.
>
> Your guidance would be much appreciated.
>
> --John J. Sparks, Ph.D.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list