[R] stats and generating a figure for a simple sign test with high inter-experiment variance
Robin Colgrove
robin at hms.harvard.edu
Mon Jun 6 04:26:17 CEST 2005
Hello all,
Sorry if this is an FAQ. I have been trying to search the archives
without success.
I have a dataset (ChiPs microarray) where the experiment to experiment
variability is very high
but where within an experiment, the data nearly always goes in the
"right" (hypothesis confirming) direction.
I am trying to figure out the right way to use R to do the statistics
and generate an appropriate figure.
To be specific, we have a virus and a mutant derivative, and the
hypothesis is that the wild type virus is specifically suppressing
active transcription in a manner that is abrogated in the mutant.
The experiment is to measure the amount of viral chromatin associated
with overall histone (should be the same between wild type and mutant),
vs. transcriptionally active chromatin (mutant should be greater than
wild type) vs. inactive chromatin (wild type should be greater than
mutant).
For each experiment there is a histone type (general, active,
inactive), a specific gene assayed (four different genes), and a virus
used for infection (wild type or mutant). These are hard experiments to
do (involving dissecting out small numbers of cells from a mouse) so
the numbers are small, but in each case, there are 3-5 pairs of wild
type vs mutant virus for each condition.
If I look at simply whether the hypothesis is confirmed for each
condition (whether the wild type/mutant difference goes the way you
would expect), then the sign is right 34/35 times, which is way beyond
reasonable significance. However, since the inter-experiment variance
is so high, if I try to do a simple rank-sum test for a particular
chromatin-gene-virus combination (3-5 pairs), the result is usually
non-significant, or never significant if Bonferroni corrections for
multiple tests are applied.
My questions are:
1) what would be the right way to use R to do and report a simple sign
test on this sort of data (paired samples, non-normal, high-inter
experiment variability).
2) What is the best way to plot this and how to do it? I was thinking
of having each (wildtype-mutant) experiment pair as the ends of line
segments with different colors or line-types for each
gene-chromatintype combination. I know this is a standard kind of plot
but I can't figure out how to do it in R.
3) What is the best way to input this data? A 4d array with virus type
(wild type or mutant) on one axis, chromatin type (non-specific,
active, inactive) on the second, gene (one of four different genes) on
the third, and experiment number (1-5) on the last? Is there a good way
to do this with data frames?
Thanks for any help. I am not trying to be a sponge and am really
trying to figure this out myself, but as a virologist/bioinformaticist
I still have a lot to learn statistics-wise.
robin colgrove
dept. of microbiology
harvard medical school
More information about the R-help
mailing list