[BioC] gene classification problem
Kimpel, Mark W
mkimpel at iupui.edu
Thu Dec 9 15:19:16 CET 2004
My apologies to those with far more statistical expertise than I, but I have what may (or may not) be a straightforward question.
After performing SAM analysis of an experiment comparing two strains of rats, I have a list of about 200 significant affy rat probesets (genes) that I have mapped to their chromosomal locations. Some of the genes appear to cluster into discrete physical chromosomal regions, which I suspect is related to underlying genetic differences between the two inbred strains. Based on their chromosomal location, I have clustered these significant genes into discrete bins. Something thing to remember when solving this problem is that the distribution along chromosomes of all affy rat probesets is not uniform. Thus my fear that some of the granularity of the chromosomal locations of significant genes could not only be due to chance, but to granularity of the underlying distribution.
At this point I would like to test:
1. if the distribution of sig. genes amongst the bins is statistically different from that of the population of all affy genes from which they were drawn.
2. if the above distribution of sig genes is, as I suspect different, which of the bins are responsible for this significant difference. It would be great to assign significance p values to the significance of each bin.
I believe this is similar to the problem faced in analyzing the distribution of genes in GO categories but I am not familiar with the proper solution.
Any sample code would be greatly appreciated. For an example, assume that I have two matrices, each of two columns with genes represented by rows. The first column is the probeset ID, the second column the "bin" that it falls into. One matrix is of all rat affy genes, the second on is only the significant genes.
Thanks,
Mark W. Kimpel MD
Department of Psychiatry
Indiana University School of Medicine
Biotechnology, Research, & Training Center
1345 W. 16th Street
Indianapolis, IN 46202
More information about the Bioconductor
mailing list