[BioC] GOstats question
jzhang at jimmy.harvard.edu
Wed Mar 30 17:31:14 CEST 2005
>Even if the design (or the aim of the Bioconductor team) is limited to a
>"general approach" which precludes working at the level of protein
>product (or transcript) -- which is the basis of the GO annotation and
>usually the goal of any test of GO category enrichment for a microarray
>result -- then for a given LL # we should have all available GO terms
>attributed, right? The example I gave showed that for at least two probe
>sets (sharing the same LL #) this is not the case -- we have only 2 GO
>terms to work with versus 12 (again using the same reference GOA as a
>reference) for a well characterized gene.
The data packages were built a few months ago and will certainly not have 100%
coverage now. You can always build your own data pacages if you want to have
>"While there are other methods for annotating probesets (see the
>articles you cite above), they all require aligning target or probe
>sequences (also available from Affy) to known entities (like refseq,
>etc.) and is NOT what the BioConductor team attempts to do (and is a
>HUGE task to do well, having done this process for some long oligo
>arrays). You could do this yourself, if necessary.
>Also, you could
>look at Ensembl which does their own annotation of Affymetrix arrays.
>The downside of doing these things yourself (or not using the
>annotation packages provided by bioconductor) is that you then need to
>either modify the nice functions from the bioconductor project to use
>your own data or you need to make your data conform to the structures
>needed for the functions to work (which as you point out, in this case,
>will not suffice)."
>It looks like that is what it takes to get to core of the problem -- One
>of my aims (I am sure like many using Affy data) is to summarize/study
>lists of probe sets derived from some test at the level of GO terms.
>Therefore it is almost intuitive that key to that aim is to resolve both
>the multiplicity issues (many probe sets to one protein product,
>somewhat addressed in the GOstats package -- at the level of LocusLink)
>as well as the splice variant issues -- otherwise, it seems that
>analyses will always stay at a "general" level.
>Thanks for the suggestions and the comments
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
More information about the Bioconductor