[BioC] GoStats
Floor Stam
fjstam at bio.vu.nl
Sun Oct 10 17:21:30 CEST 2004
hi michael
In case you're impatient: i wrote a function (it's at the bottom of
this message) that sort of does what you want automatically. I need to
warn you beforehand: i am a biologist and a beginner at R so it's not
perfect at all! The table you get as an output will need some cleaning
up as you will see. I do that using an excel macro, but it should be
possible in R, i just don't know how.
In this function, MATR is my matrix with gene expression changes of
8000 genes over over time, so replace that with your own. The first
column is the spot identifier (so probename in the case of an affy
array). Im using my own annotation package called 'floor', so replace
that with your affy package. Use
GOtable("0006818")
If you want to know which genes belong to the 'hydrogen transport'
class, for instance.
The output is a textfile in the form of table with all genes that
belong to the GO you specified with the feature identifier and the
genename and its regulation that is in the matrix. The last row is the
name of the GO and the amount of representing genes on the array. The
name of the textfile is the GO identifier.
if you change it for the better, please let me know. I suppose it
should be possible to use this function in a sapply command to execute
it on a list of overrepresented GOs. I also made a function that makes
tables of overrepresented GO identifiers and corresponding p values
based on a matrix of gene x condition that exists of 0 (for not sign.
regulated) and 1 (for sign regulated genes). Let me know if you want
it. Have Fun.
Floor
GOtable<-function(GOID){
GObase<-paste("GO:", GOID, sep = "")
filename<-paste("GO_", GOID, ".xls", sep = "")
tmpv<-mget(GObase, env=floorGO2ALLPROBES, ifnotfound="NA")
tmp<-unlist(tmpv)
tmpindex<-as.integer(tmp)
index <- MATR[,1] %in% tmpindex
subset <- MATR[index,]
tmp<-unique(sort(as.integer(tmp)))
tmp<-as.character(tmp)
tmp<-mget(tmp, env=floorGENENAME, ifnotfound="NA")
tmp<-as.matrix(tmp)
colnames(tmp)=list("GeneName")
data<-cbind(tmp, subset)
bla<-get(GObase, env=GOTERM)
rows<-(dim(data))[1]
rows1<-rows+1
data[rows1,1]<-bla
write.table(data, file=filename, sep="\t")
}
_______________________________________________________
Floor Stam
Vrije Universiteit Amsterdam
Faculty of Earth and Life Sciences
Department of Molecular and Cellular Neurobiology
De Boelelaan 1085
1081HV Amsterdam
The Netherlands
Ph: +31-20-4447114
+31-20-5665512
Fax: +31-20-4447112
e-mail: fjstam at bio.vu.nl
_______________________________________________________
On 9 Oct 2004 , at 5:26, Robert Gentleman wrote:
> On Thu, Oct 07, 2004 at 06:25:45PM +0200, Auer Michael wrote:
>> When applying GOHyperG to a series of Locus IDs, the function only
>> reports
>> the GO ID and the corresponding frequencies. But what most people
>> want to
>> have, when interpreting the GO results is the corresponding Affy IDS
>> which
>> map on the different Locus IDs and GO IDS.
>>
>
> I will add this feature to the next release.
> You will then get the GO ID and the set of LocusLink IDs that are
> mapped to that GO ID. You would then use some other tools (like
> those you suggest below) to find the related Affymetrix (or other
> manufacturer IDs).
>
>
>> Code I use
>>
>> Locusids<-mget(AffyIDS,env=hgu133aLOCUSID)
>> /*retrieves the LOCUS LINK IDS from a series of AffyIDS*/
>>
>> MF<- GOHyperG(Locusids,lib="hgu133a",what="MF")
>> /*reports the occurring Gene Ontologies, the frequencies and the
>> hypergeometric p values*/
>>
>> How can I get a result of the form
>>
>> 1. GO ID, freq (output of GoHyperG)
>>
>> 2. GO ID, LOCUS ID (multiple occurring GO IDs)
>>
>> 3. LOCUS ID, AFFY ID (multiple occurring LOCUS IDs)
>>
>
> What you would need to do now, is to take each of the set of
> LocusLink IDs you supplied to the call to GOHyperG, and the output
> set of GO IDs and find which LL's are linked to which GO IDs. This
> is done in GOALLLOCUSID, in GO 1.6.4.
>
> Finally you would mapp the LLs to whatever manufacturer IDs you
> have. That would be done from the manufacturer specific data
> package.
>
>>
>> I know that there is the function hgu95av2GO2ALLPROBES$"GO:0000166"
>> which
>> retrieves all the probes mapping on a GO, but still it is not the
>> result.
>> Ok, one would have to make a for loop and look for the occurring
>> probes in
>> the sample. But isn't there a better way????
>
> I am not sure what you mean by a better way? Nor how it would
> necessarily answer your question.
>
>
>>
>>
>> Thanks a lot
>>
>> Michael Auer
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> --
> +----------------------------------------------------------------------
> -----+
> | Robert Gentleman phone : (617) 632-5250
> |
> | Associate Professor fax: (617) 632-2444
> |
> | Department of Biostatistics office: M1B20
> |
> | Harvard School of Public Health email: rgentlem at jimmy.harvard.edu
> |
> +----------------------------------------------------------------------
> -----+
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
More information about the Bioconductor
mailing list