[BioC] GoStats and microRNA pipeline using Biomart

James F. Reid james.reid at ifom-ieo-campus.it
Wed Mar 30 19:00:54 CEST 2011


Hi David,

I understand your reasoning for counting the number of miRNA binding 
sites with the 3' UTR of a predicted target, you are trying to include 
the 'combinatorial' effect of miRNA targeting.
I would try to include the length of any UTR however (some kind of 
normalization if you wish) since the longer the UTR the more chances are 
that miRNA will bind.
Does this make sense?

Best,
J.

On 03/30/2011 05:23 PM, David martin wrote:
> On 03/30/2011 04:56 PM, Steve Lianoglou wrote:
>> Hi,
>>
>> On Wed, Mar 30, 2011 at 9:43 AM, David martin<vilanew at gmail.com> wrote:
>>> Hi,
>>> I open this new discussion so not to confuse with the previous one.
>>>
>>> The objective here is to look for overrepresented GoTerms from microRNA
>>> targets. One microRNA can have several targets (genes) and one single
>>> gene
>>> can be targeted by several microRNAs. The assumption is to check for a
>>> specific microRNAs which GoTerms are overrepresented.
>>>
>>>
>>> Ok so let's say me my microRNA of interest is mir-A.
>>>
>>> Step1: based on my favorite prediction algorithm i have managed to get a
>>> list of genes targeted by mir-A. The genes are ensembl transcripts
>>> and as i
>>> said before miR-A can target several times the same transcript (at
>>> different
>>> location) so i need to account for this.
>>>
>>> miR-A targets ->
>>> ENST001,ENST001,ENST001,ENST0025,ENST089,ENST099,ENST0099......) up
>>> to 300
>>> different transcripts.
>>
>> I don't get why you'd want to have the same transcript multiple times
>> as a target for the miRNA -- if the miRNA targets the same transcript
>> in two different locations, you then want to double count the GO terms
>> associated with that transcript?
>
> That's correct. The idea behind that is that a transcript targeted at
> different locations is more "likely to be twice targeted" and therefore
> GO term associated to this transcript have to be replicated. This sound
> good to me but i don not expect that you agree on that.
>
>
> i have managed to get all GO ids with a small function. Basically you
> input one transcript id in a loop
>
> l = length(genes) # list of all ensembl transcripts
> for (l in 1:l)
> {
> goid[l] <- getgoids("ENST...")
>
> }
> getgoids <- function (id) {
> getBM(attributes=c(
> 'go_biological_process_id',
> 'go_biological_process_linkage_type',
> 'go_cellular_component_id',
> 'go_cellular_component_linkage_type',
> 'go_molecular_function_id',
> 'go_molecular_function_linkage_type')
> ,filters="ensembl_transcript_id", values=id, mart=mart)
> }
>
> I agree wioth you that i might need to add the transcript_id to be able
> to use for GoStats mapping between transcripts and GO ids.
>
>
> Now i want to use that as the univere set for GoStats and do hyperG to
> compare with the GO for a specific microRNA.
>
> I guess :
>
> goframeData = data.frame(frame$go_id, frame$Evidence, frame$gene_id)
> #list of all GOids from all transcripts targeted by all microRNA
>
> goFrame = GOFrame(goframeData, organism = "Homo sapiens")
> goAllFrame = GOAllFrame(goFrame) #Geneid to ALL go id mapping
>
>
> In the GSEAGOHyperGParams function below can you correct me ?
> geneSetCollection = List of all go ids off all transcripts targetted by
> all microRNA
> single_mir_transcript_ids = list of ENSEMBl transcripts ids targeted by
> a specific microRNA
> univerGeneIds: list of transcript to Go mapping
> Is this correc t?
>
>
> gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())
> params <- GSEAGOHyperGParams(name = "My Custom GSEA based annot
> Params",geneSetCollection = gsc, geneIds = single_mir_transcripts_ids,
> universeGeneIds = universe,ontology = "BP", pvalueCutoff = 0.05,
> conditional = FALSE,testDirection = "over")
>
>
>>
>> Somehow that seems wrong to me -- if the "hit count" of the miRNA to
>> the transcript is important to you, one thing you can do is store your
>> miR-A vector as its "table()" so the names will the the transcripts,
>> and the values will be the number of hits.
>>
>>> I use biomart to get the corresponding GoIds for these transcripts
>>>
>>> ....
>>> #Select mart database
>>> mart<- useMart("ensembl", dataset="hsapiens_gene_ensembl")
>>>
>>> #Get go for a specific transcript
>>> # First problem as Biomart will not return twice GoTerms for duplicated
>>> transcripts. The example below show that for transcript
>>> c("ENST00000347770","ENST00000347770") i get the same goTerms than for
>>> transcript c("ENST00000347770").
>>> # As i said before a microRNA can target several times the same
>>> microRNA so
>>> twice the number of goterms associated to this particular microRNA.
>>> Can we
>>> force biomart to return redundant GoTerms ????
>>
>> I'm actually still not sure what you want to do, but if you follow my
>> advice above, you can manipulate the data.frame you get from getBM to
>> replicate rows (or whatever you're trying to do).
>>
>> You will also want to add "ensembl_transcript_id" to your vector of
>> attributes so you can reassociate the rows in the table that is
>> returned to you with your original ensembl transcripts you are
>> querying for, eg:
>>
>> R> gomir<- getBM(attributes=c('ensembl_transcript_id', 'go..', ...),
>> filters='ensemble_transcript_id', values=c("ENST..."), mart=mart)
>>
>> Hope that helps,
>> -steve
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list