[BioC] GoStats and microRNA pipeline using Biomart

Wed Mar 30 20:31:50 CEST 2011

Yes absolutly. A few ensembl releases ago UTR tend to be smaller but 
this is getting better now. How would you normalize that based on length ?

On 03/30/2011 07:00 PM, James F. Reid wrote:
> Hi David,
>
> I understand your reasoning for counting the number of miRNA binding
> sites with the 3' UTR of a predicted target, you are trying to include
> the 'combinatorial' effect of miRNA targeting.
> I would try to include the length of any UTR however (some kind of
> normalization if you wish) since the longer the UTR the more chances are
> that miRNA will bind.
> Does this make sense?
>
> Best,
> J.
>
> On 03/30/2011 05:23 PM, David martin wrote:
>> On 03/30/2011 04:56 PM, Steve Lianoglou wrote:
>>> Hi,
>>>
>>> On Wed, Mar 30, 2011 at 9:43 AM, David
>>> martin<vilanew at gmail.com> wrote:
>>>> Hi,
>>>> I open this new discussion so not to confuse with the previous one.
>>>>
>>>> The objective here is to look for overrepresented GoTerms from microRNA
>>>> targets. One microRNA can have several targets (genes) and one single
>>>> gene
>>>> can be targeted by several microRNAs. The assumption is to check for a
>>>> specific microRNAs which GoTerms are overrepresented.
>>>>
>>>>
>>>> Ok so let's say me my microRNA of interest is mir-A.
>>>>
>>>> Step1: based on my favorite prediction algorithm i have managed to
>>>> get a
>>>> list of genes targeted by mir-A. The genes are ensembl transcripts
>>>> and as i
>>>> said before miR-A can target several times the same transcript (at
>>>> different
>>>> location) so i need to account for this.
>>>>
>>>> miR-A targets ->
>>>> ENST001,ENST001,ENST001,ENST0025,ENST089,ENST099,ENST0099......) up
>>>> to 300
>>>> different transcripts.
>>>
>>> I don't get why you'd want to have the same transcript multiple times
>>> as a target for the miRNA -- if the miRNA targets the same transcript
>>> in two different locations, you then want to double count the GO terms
>>> associated with that transcript?
>>
>> That's correct. The idea behind that is that a transcript targeted at
>> different locations is more "likely to be twice targeted" and therefore
>> GO term associated to this transcript have to be replicated. This sound
>> good to me but i don not expect that you agree on that.
>>
>>
>> i have managed to get all GO ids with a small function. Basically you
>> input one transcript id in a loop
>>
>> l = length(genes) # list of all ensembl transcripts
>> for (l in 1:l)
>> {
>> goid[l] <- getgoids("ENST...")
>>
>> }
>> getgoids <- function (id) {
>> getBM(attributes=c(
>> 'go_biological_process_id',
>> 'go_biological_process_linkage_type',
>> 'go_cellular_component_id',
>> 'go_cellular_component_linkage_type',
>> 'go_molecular_function_id',
>> 'go_molecular_function_linkage_type')
>> ,filters="ensembl_transcript_id", values=id, mart=mart)
>> }
>>
>> I agree wioth you that i might need to add the transcript_id to be able
>> to use for GoStats mapping between transcripts and GO ids.
>>
>>
>> Now i want to use that as the univere set for GoStats and do hyperG to
>> compare with the GO for a specific microRNA.
>>
>> I guess :
>>
>> goframeData = data.frame(frame$go_id, frame$Evidence, frame$gene_id)
>> #list of all GOids from all transcripts targeted by all microRNA
>>
>> goFrame = GOFrame(goframeData, organism = "Homo sapiens")
>> goAllFrame = GOAllFrame(goFrame) #Geneid to ALL go id mapping
>>
>>
>> In the GSEAGOHyperGParams function below can you correct me ?
>> geneSetCollection = List of all go ids off all transcripts targetted by
>> all microRNA
>> single_mir_transcript_ids = list of ENSEMBl transcripts ids targeted by
>> a specific microRNA
>> univerGeneIds: list of transcript to Go mapping
>> Is this correc t?
>>
>>
>> gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())
>> params <- GSEAGOHyperGParams(name = "My Custom GSEA based annot
>> Params",geneSetCollection = gsc, geneIds = single_mir_transcripts_ids,
>> universeGeneIds = universe,ontology = "BP", pvalueCutoff = 0.05,
>> conditional = FALSE,testDirection = "over")
>>
>>
>>>
>>> Somehow that seems wrong to me -- if the "hit count" of the miRNA to
>>> the transcript is important to you, one thing you can do is store your
>>> miR-A vector as its "table()" so the names will the the transcripts,
>>> and the values will be the number of hits.
>>>
>>>> I use biomart to get the corresponding GoIds for these transcripts
>>>>
>>>> ....
>>>> #Select mart database
>>>> mart<- useMart("ensembl", dataset="hsapiens_gene_ensembl")
>>>>
>>>> #Get go for a specific transcript
>>>> # First problem as Biomart will not return twice GoTerms for duplicated
>>>> transcripts. The example below show that for transcript
>>>> c("ENST00000347770","ENST00000347770") i get the same goTerms than for
>>>> transcript c("ENST00000347770").
>>>> # As i said before a microRNA can target several times the same
>>>> microRNA so
>>>> twice the number of goterms associated to this particular microRNA.
>>>> Can we
>>>> force biomart to return redundant GoTerms ????
>>>
>>> I'm actually still not sure what you want to do, but if you follow my
>>> advice above, you can manipulate the data.frame you get from getBM to
>>> replicate rows (or whatever you're trying to do).
>>>
>>> You will also want to add "ensembl_transcript_id" to your vector of
>>> attributes so you can reassociate the rows in the table that is
>>> returned to you with your original ensembl transcripts you are
>>> querying for, eg:
>>>
>>> R> gomir<- getBM(attributes=c('ensembl_transcript_id', 'go..', ...),
>>> filters='ensemble_transcript_id', values=c("ENST..."), mart=mart)
>>>
>>> Hope that helps,
>>> -steve
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>