[BioC] GoStats and microRNA pipeline using Biomart
David martin
vilanew at gmail.com
Wed Mar 30 15:43:01 CEST 2011
Hi,
I open this new discussion so not to confuse with the previous one.
The objective here is to look for overrepresented GoTerms from microRNA
targets. One microRNA can have several targets (genes) and one single
gene can be targeted by several microRNAs. The assumption is to check
for a specific microRNAs which GoTerms are overrepresented.
Ok so let's say me my microRNA of interest is mir-A.
Step1: based on my favorite prediction algorithm i have managed to get a
list of genes targeted by mir-A. The genes are ensembl transcripts and
as i said before miR-A can target several times the same transcript (at
different location) so i need to account for this.
miR-A targets ->
ENST001,ENST001,ENST001,ENST0025,ENST089,ENST099,ENST0099......) up to
300 different transcripts.
I use biomart to get the corresponding GoIds for these transcripts
....
#Select mart database
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
#Get go for a specific transcript
# First problem as Biomart will not return twice GoTerms for duplicated
transcripts. The example below show that for transcript
c("ENST00000347770","ENST00000347770") i get the same goTerms than for
transcript c("ENST00000347770").
# As i said before a microRNA can target several times the same microRNA
so twice the number of goterms associated to this particular microRNA.
Can we force biomart to return redundant GoTerms ????
gomir = getBM(attributes=c(
'go_biological_process_id',
'go_biological_process_linkage_type',
'go_cellular_component_linkage_type',
'go_cellular_component_id',
'go_molecular_function_id',
'go_molecular_function_id')
,filters="ensembl_transcript_id",
values=c("ENST00000347770","ENST00000347770"......), mart=mart)
.... i will complete the rest of the pipiline with GoStats if i get
clean on that first.
More information about the Bioconductor
mailing list