[R] analysis and figure for sign test in setting of high inter-experiment variance

Mon Jun 6 17:13:53 CEST 2005

Hello all,

Sorry if this is an FAQ. I have been trying to search the archives 
without success.
I have a dataset (ChiPs microarray) where the experiment to experiment 
variability is very high
but where within an experiment, the data nearly always goes in the 
"right" (hypothesis confirming) direction.
I am trying to figure out the right way to use R to do the statistical 
analysis and generate an appropriate figure.

To be specific, we have a virus and a mutant derivative, and the 
hypothesis is that the wild type virus is specifically suppressing 
transcription activity in a manner that is abrogated in the mutant.

The experiment is to measure the amount of viral chromatin associated 
with overall histone (should be the same between wild type and mutant), 
vs. transcriptionally active chromatin (mutant should be greater than 
wild type) vs. inactive chromatin (wild type should be greater than 
mutant).

For each datapoint there four variables:  a histone type (general, 
active, inactive), a specific gene assayed (four different genes),  a 
virus used for infection (wild type or mutant), and an experiment 
number (each combination repeated 3-5 times) These are hard experiments 
to do (involving dissecting out small numbers of cells from a mouse) so 
the numbers are small, but in each case, there are 3-5 pairs of wild 
type vs mutant virus for each condition.

If I look at simply whether the hypothesis is confirmed for each 
condition (whether the wild type/mutant difference goes the way you 
would expect), then the sign is right 34/35 times, which is way beyond 
reasonable significance. However, since the inter-experiment variance 
is so high, if I try to do a simple rank-sum test for a particular 
chromatin-gene-virus combination (3-5 pairs), the result is usually 
non-significant (never significant if Bonferroni corrections for 
multiple tests are applied).

My questions are:

1) Is there a good way within R to do and report a simple sign test on 
this sort of data (paired samples, non-normal, high-inter experiment 
variability)?

2) What would be a good way to plot this sort of noisy data (and how to 
do it in R)? I was thinking of having each (wildtype-mutant) experiment 
pair as the ends of line segments with different colors or line-types 
for each gene-chromatin type combination. I know this is a standard 
kind of plot but I can't figure out how to do it in R.

3) What is the best way to input this data? A 4d array with virus type 
(wild type or mutant) on one axis, chromatin type (non-specific, 
active, inactive) on the second, gene (one of four different genes) on 
the third, and experiment number (1-5) on the last? Is there a good way 
to do this with data frames?

Thanks for any help or pointers to appropriate how-to's. I am really 
trying to figure this out myself, but as a virologist/bioinformaticist 
new to R, I still have a lot to learn statistics-wise.

robin colgrove
dept. of microbiology
harvard medical school