[BioC] GoStats and microRNA pipeline using Biomart

Thu Mar 31 10:10:28 CEST 2011

Hi David,

On 03/30/2011 08:31 PM, David martin wrote:
 > Yes absolutly. A few ensembl releases ago UTR tend to be smaller but
 > this is getting better now. How would you normalize that based on 
length ?

I'm afraid that I don't have a simple answer to this it would need 
thinking out especially wrt to your GO enrichment analysis.
Any ideas from the members of the list?

Best,
J.

> On 03/30/2011 07:00 PM, James F. Reid wrote:
>> Hi David,
>>
>> I understand your reasoning for counting the number of miRNA binding
>> sites with the 3' UTR of a predicted target, you are trying to include
>> the 'combinatorial' effect of miRNA targeting.
>> I would try to include the length of any UTR however (some kind of
>> normalization if you wish) since the longer the UTR the more chances are
>> that miRNA will bind.
>> Does this make sense?
>>
>> Best,
>> J.
>>
>> On 03/30/2011 05:23 PM, David martin wrote:
>>> On 03/30/2011 04:56 PM, Steve Lianoglou wrote:
>>>> Hi,
>>>>
>>>> On Wed, Mar 30, 2011 at 9:43 AM, David
>>>> martin<vilanew at gmail.com> wrote:
>>>>> Hi,
>>>>> I open this new discussion so not to confuse with the previous one.
>>>>>
>>>>> The objective here is to look for overrepresented GoTerms from
>>>>> microRNA
>>>>> targets. One microRNA can have several targets (genes) and one single
>>>>> gene
>>>>> can be targeted by several microRNAs. The assumption is to check for a
>>>>> specific microRNAs which GoTerms are overrepresented.
>>>>>
>>>>>
>>>>> Ok so let's say me my microRNA of interest is mir-A.
>>>>>
>>>>> Step1: based on my favorite prediction algorithm i have managed to
>>>>> get a
>>>>> list of genes targeted by mir-A. The genes are ensembl transcripts
>>>>> and as i
>>>>> said before miR-A can target several times the same transcript (at
>>>>> different
>>>>> location) so i need to account for this.
>>>>>
>>>>> miR-A targets ->
>>>>> ENST001,ENST001,ENST001,ENST0025,ENST089,ENST099,ENST0099......) up
>>>>> to 300
>>>>> different transcripts.
>>>>
>>>> I don't get why you'd want to have the same transcript multiple times
>>>> as a target for the miRNA -- if the miRNA targets the same transcript
>>>> in two different locations, you then want to double count the GO terms
>>>> associated with that transcript?
>>>
>>> That's correct. The idea behind that is that a transcript targeted at
>>> different locations is more "likely to be twice targeted" and therefore
>>> GO term associated to this transcript have to be replicated. This sound
>>> good to me but i don not expect that you agree on that.
>>>
>>>
>>> i have managed to get all GO ids with a small function. Basically you
>>> input one transcript id in a loop
>>>
>>> l = length(genes) # list of all ensembl transcripts
>>> for (l in 1:l)
>>> {
>>> goid[l] <- getgoids("ENST...")
>>>
>>> }
>>> getgoids <- function (id) {
>>> getBM(attributes=c(
>>> 'go_biological_process_id',
>>> 'go_biological_process_linkage_type',
>>> 'go_cellular_component_id',
>>> 'go_cellular_component_linkage_type',
>>> 'go_molecular_function_id',
>>> 'go_molecular_function_linkage_type')
>>> ,filters="ensembl_transcript_id", values=id, mart=mart)
>>> }
>>>
>>> I agree wioth you that i might need to add the transcript_id to be able
>>> to use for GoStats mapping between transcripts and GO ids.
>>>
>>>
>>> Now i want to use that as the univere set for GoStats and do hyperG to
>>> compare with the GO for a specific microRNA.
>>>
>>> I guess :
>>>
>>> goframeData = data.frame(frame$go_id, frame$Evidence, frame$gene_id)
>>> #list of all GOids from all transcripts targeted by all microRNA
>>>
>>> goFrame = GOFrame(goframeData, organism = "Homo sapiens")
>>> goAllFrame = GOAllFrame(goFrame) #Geneid to ALL go id mapping
>>>
>>>
>>> In the GSEAGOHyperGParams function below can you correct me ?
>>> geneSetCollection = List of all go ids off all transcripts targetted by
>>> all microRNA
>>> single_mir_transcript_ids = list of ENSEMBl transcripts ids targeted by
>>> a specific microRNA
>>> univerGeneIds: list of transcript to Go mapping
>>> Is this correc t?
>>>
>>>
>>> gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())
>>> params <- GSEAGOHyperGParams(name = "My Custom GSEA based annot
>>> Params",geneSetCollection = gsc, geneIds = single_mir_transcripts_ids,
>>> universeGeneIds = universe,ontology = "BP", pvalueCutoff = 0.05,
>>> conditional = FALSE,testDirection = "over")
>>>
>>>
>>>>
>>>> Somehow that seems wrong to me -- if the "hit count" of the miRNA to
>>>> the transcript is important to you, one thing you can do is store your
>>>> miR-A vector as its "table()" so the names will the the transcripts,
>>>> and the values will be the number of hits.
>>>>
>>>>> I use biomart to get the corresponding GoIds for these transcripts
>>>>>
>>>>> ....
>>>>> #Select mart database
>>>>> mart<- useMart("ensembl", dataset="hsapiens_gene_ensembl")
>>>>>
>>>>> #Get go for a specific transcript
>>>>> # First problem as Biomart will not return twice GoTerms for
>>>>> duplicated
>>>>> transcripts. The example below show that for transcript
>>>>> c("ENST00000347770","ENST00000347770") i get the same goTerms than for
>>>>> transcript c("ENST00000347770").
>>>>> # As i said before a microRNA can target several times the same
>>>>> microRNA so
>>>>> twice the number of goterms associated to this particular microRNA.
>>>>> Can we
>>>>> force biomart to return redundant GoTerms ????
>>>>
>>>> I'm actually still not sure what you want to do, but if you follow my
>>>> advice above, you can manipulate the data.frame you get from getBM to
>>>> replicate rows (or whatever you're trying to do).
>>>>
>>>> You will also want to add "ensembl_transcript_id" to your vector of
>>>> attributes so you can reassociate the rows in the table that is
>>>> returned to you with your original ensembl transcripts you are
>>>> querying for, eg:
>>>>
>>>> R> gomir<- getBM(attributes=c('ensembl_transcript_id', 'go..', ...),
>>>> filters='ensemble_transcript_id', values=c("ENST..."), mart=mart)
>>>>
>>>> Hope that helps,
>>>> -steve
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>