[BioC] GOstats gene set size selection
alex lam (RI)
alex.lam at roslin.ed.ac.uk
Thu May 1 15:59:24 CEST 2008
Hi Sean and other BioC users,
Thanks for the replies a couple of weeks ago. Now I am trying to use
Category as suggested and I think the underlying principles are better
than Gostats for what I want to do, especially that I don't have to use
an arbitary threshold on my test statistics to select a subset of genes.
I followed the code in the vignette of Category until the matrix Z gets
divided by sqrt(rowSums).
Because what I am doing is an eQTL genome scan, at any one position I
have the likelihood ratio test statistics for all probesets rather than
two-sample t-statistics. I read in the vignette that X should be
approximately normal. So, I figure that maybe I should standardize the
likelihood ratio statistics to z-scores before multiplying with the
adjacency matrix. Is it the correct thing to do?
for(cM in 1:lengthOfGenome) {
lrt <- LRT[expressedAffyIds, cM]
# ... filter out duplicates entrezGenes and create adjacency matrix
...
z.score <- (lrt - mean(lrt)) / sd(lrt)
tA <- AmER2 %*% z.score
tA <- tA / sqrt(rs2)
names(tA) <- row.names(AmER2)
qqnorm(tA)
}
Cheers,
Alex
-----Original Message-----
From: Sean MacEachern [mailto:sean.maceach at gmail.com]
Sent: 17 April 2008 17:07
To: alex lam (RI); bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] GOstats gene set size selection
Hi Alex,
I'm not too sure if this helps with your question, but I'll put my two
cents in... I am working with chickens and trying to create a large list
of genes for an eQTL study from an initial simple microarray design that
compares resistant vs susceptible birds, due to the small number of
genes that I have found with differential expression I have attempted to
increase the size of my list by examining significant GO terms. Most of
the GO terms I have pulled out using hyperGTest are not very helpful due
to their breadth.
I have found the Category package a little more helpful. Kegg pathways
are a little more specific and you can create an adjacency matrix and
use the
rowSums() command to filter your dataset. I think you can also treat GO
terms as categories if you need to. It might be a little of topic, but
it could be worth looking at.
Cheers,
Sean
On 4/17/08 7:28 AM, "alex lam (RI)" <alex.lam at roslin.ed.ac.uk> wrote:
> Dear colleagues,
>
> I have been following the GOstats vignette to test GO terms
association.
> I would like to know whether it is possible to set limits on the
> number of selected genes in GO term and the size of that term on my
affy chip?
>
> For example, can I tell hyperGTest to skip testing a GO term if the
> number of significant genes in that term is under, say, 3, or if there
> are more than 400 genes of that GO term on the chip?
>
> Currently I found many of my significant GO terms not very specific.
> As I am trying to incorporate GOstats to an expression QTL (eQTL)
> genome scan, I get a lot of output. Therefore, ideally I would like to
> filter out these terms before test rather than screening the results
> after test. Is there such an option with hyperGTest?
>
> Many thanks for your advice,
> Alex
>
>> sessionInfo()
> R version 2.6.2 Patched (2008-03-24 r44882) x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US
> .U
> TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UT
> F-
> 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_
> ID
> ENTIFICATION=C
>
> attached base packages:
> [1] splines tools stats graphics grDevices utils
datasets
> [8] methods base
>
> other attached packages:
> [1] GOstats_2.4.0 Category_2.4.0 genefilter_1.16.0
> [4] survival_2.34 RBGL_1.14.0 annotate_1.16.1
> [7] xtable_1.5-2 GO.db_2.0.2 AnnotationDbi_1.0.6
> [10] RSQLite_0.6-8 DBI_0.2-4 Biobase_1.16.3
> [13] graph_1.16.1
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.10
>>
>
> --------------------------------------------
> Alex C. Lam
> Roslin Institute (Edinburgh)
> Midlothian
> EH25 9PS
> United Kingdom
> Tel: +44 131 5274471
>
> Former email address: alex.lam at bbsrc.ac.uk New email address:
> alex.lam at roslin.ed.ac.uk Both addresses are functional
>
> Roslin Institute is a company limited by guarantee, registered in
> Scotland (registered number SC157100) and a Scottish Charity
> (registered number SC023592). Our registered office is at Roslin,
> Midlothian, EH25 9PS. VAT registration number 847380013.
>
> The information contained in this e-mail (including any attachments)
is
> confidential and is intended for the use of the addressee only. The
> opinions expressed within this e-mail (including any attachments) are
> the opinions of the sender and do not necessarily constitute those of
> Roslin Institute (Edinburgh) ("the Institute") unless specifically
> stated by a sender who is duly authorised to do so on behalf of the
> Institute
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list