[R] Determining Total Number of Multiple Comparisons

Fri Mar 14 19:15:22 CET 2014

Greetings,

I'm running a series of Chi-square tests to examine differences across 
categorical variables.  The situation is this:

I have three variables: sex (M/F), habitat (5 levels), season 
(W,Sp,Su,F).  A Cochran-Mantel-Haenzel test detects non-indepedence 
across my sex strata.  I then subsetted my data into males (mat.M) and 
females (mat.F).  Within each sex, I investigated independence between 
habitat and seasons (ex., chisq.test(mat.M)).  This is essentially a 
multiple comparison test, so I'm correcting my p-value using 
p.adjust().  My question pertains to 'n' in this function, and how 'n' 
is calculated as subsets of data are used to tease out the differences 
in habitat use across seasons.

Q1.  Am I correct to specify 'n=2' when performing the test of 
independence for both male and female data?
     example: p.adjust(chisq.test(mat.M)$p.value,n=2,method='bonferroni')

Non-independence was detected for both male and female subsets. Now, I'm 
interested in seasonal changes in habitat use, which would require 
additional multiple comparison tests.  Thus, I have another question 
regarding the specification of 'n'.

Q2.  If I examined the seasonal changes within males using prop.test(), 
do I add up all multiple comparisons that will be performed (female 
included), or just the number of tests that will be performed using the 
male data?  The difference is n=5 for male only vs n=10 for both sexes.
Here's an example.  Habitat types are Forest, Field, Crops, River, 
Other, and these are the rownames of my matrix (males only)
     pval <- prop.test(mat.M['Forest',], colSums(mat.M))$p.value
     p.adjust(pval,n=5,method='bonferroni')

Lastly, I have detected differences in habitat use across seasons. I now 
want to determine which seasons are different within a specific habitat 
type.  Like before, I can pull out the count data and run a series of 
prop.test() for all 6 comparisons (W vs Sp, W vs Su, W vs F, Sp vs Su, 
Sp vs F, Su vs F).  This leads to my final questions.

Q3.  Does 'n' in this case refer to only the 6 comparisons within a 
habitat type within a sex, or will I need to account for ALL tests that 
will be performed (n=2 sex * 5 habitats * 6 pairwise seasonal 
comparisons = 60 max)?  I will not run pairwise seasonal comparisons for 
any habitat type that gives a non-significant p-value according to Q2 
above.

Thanks for the help...