[BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Sun Jun 28 09:16:49 CEST 2009


To get those you will need to download the mature.fa.gz and maturestar.fa.gz files from the miRBase ftp site: ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/.  Then you can unzip them and do a grep to find the hsa miRs.

They'll be in fasta format, and whether or not Bioconductor can read them in I have no idea - I use Bioperl for all my sequence handling.

Mick


-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch on behalf of mauede at alice.it
Sent: Sun 28/06/2009 2:05 AM
To: Sean Davis
Cc: bioconductor List
Subject: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA,gene-3'UTR-sequence)
 
It is true I received a number of answers providing examples of data extraction from Ensembl.
However, none of them extracts any identifier contained in file "maturestar" 
(ex.  >hsa-let-7d* MIMAT0004484 Homo sapiens let-7d*
CUAUACGACCUGCUGCCUUUCU)
or in file "mature"
(ex. >hsa-miR-30a MIMAT0000087 Homo sapiens miR-30a
UGUAAACAUCCUCGACUGGAAG)
or in file "/hsa.gff"

All the three above mentioned files contain the miRNA identifier and some other identifier that I do not know what it is.
You may ask me why I haven't try to get all possible attribute values from Ensembl to check if some relationship can be found 
I anticipate my answer:

> library(biomaRt)
> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
Error in value[[3L]](cond) : 
  Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.

In fact I tried to ping server "www.biomart.org" and it did not work. ... 
I deduce the server is really down at the moment.

Anyway, I do not know if either file above mentioned contains validated miRNAs.

Best regards,
Maura



-----Messaggio originale-----
Da: Sean Davis [mailto:seandavi at gmail.com]
Inviato: sab 27/06/2009 14.23
A: mauede at alice.it
Cc: Steve Lianoglou; bioconductor List
Oggetto: Re: [BioC] R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)
 
On Sat, Jun 27, 2009 at 1:42 AM, <mauede at alice.it> wrote:

> What is the attribute correspondent to the miR name (ex. "hsa-miR-130a") ?


Hi, Maura.

This information does not exist directly via biomaRt.  You can use the
listAttributes() function to see what attributes are available if you are
ever in doubt.


>
>
> I have to link the gene information (actually right now I am only intrested
> to the 3'UTR sequence) to the miRNA for which the gene in question is a
> target.


This question has been answered several times for you. You'll want to try
those suggestions.  At the bottom of emails to this list, you will find a
link to search the archives in case you didn't save the emails sent to you
earlier.

Sean


>
> -----Messaggio originale-----
> Da: Steve Lianoglou [mailto:mailinglist.honeypot at gmail.com]
> Inviato: gio 25/06/2009 16.02
> A: mauede at alice.it
> Cc: bioconductor List
> Oggetto: Re: [BioC] R:  how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
>
> One more thing to add:
>
> >> Similarity   hsa-miR-130a    miRanda miRNA_target    2       120825363
>     120825385
> >> +    .       16.5359 1.687830e-02    ENST00000295228 INHBB
>
> > R> library(biomaRt)
> > R> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
> > R> refseqs <-
> > c
> > ("NM_000757
> > ","NM_000757
> > ","NM_005461","NM_005924","NM_005924","NM_005924","NM_019102")
> > R> gene.map <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id',
> > 'ensembl_transcript_id','refseq_dna'), filters='refseq_dna',
> > value=refseqs, mart=hmart)
> >
> > R> gene.map
> >  hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna
> > 1        CSF1 ENSG00000184371       ENST00000369802  NM_000757
> > 2        MAFB ENSG00000204103       ENST00000396967  NM_005461
> > 3       MEOX2 ENSG00000106511       ENST00000262041  NM_005924
> > 4       HOXA5 ENSG00000106004       ENST00000222726  NM_019102
>
>
> Your original ensembl transcript wasn't included in our result, so
> instead of telling the `getBM` function to use a list of refseq IDs to
> get info for, we can flip this around and find out what refseq ID your
> "ENST00000295228" transcript points to. Using the same `hmart` object,
> you can do it like so:
>
> R> getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id',
> 'ensembl_transcript_id','refseq_dna'),
> filters='ensembl_transcript_id', value='ENST00000295228', mart=hmart)
>
>   hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna
> 1       INHBB ENSG00000163083       ENST00000295228  NM_002193
>
> Note we just had to change the type of ID we are passing to the
> `filters` parameter.
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
>
> Contact Info: http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>
>
>
>
>
>
>
>
> tutti i telefonini TIM!
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>













tutti i telefonini TIM!


	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list