Actually I extracted the same information the old way, that is using a loop which provided one refseq_dna at a time.
I know thsi is not expected with a high-level language like R. However i could see that some ENST correspond to two different 
HGNC symbols. Moreover the 3utr sequence is not available for all ENSTs I have.

Thank you for your answer.
Regards,
Maura


-----Messaggio originale-----
Da: Sean Davis [mailto:seandavi@gmail.com]
Inviato: mer 29/07/2009 7.46
A: mauede@alice.it
Cc: Bioconductor List
Oggetto: Re: [BioC] Why am I finding a mismatch between refseq_dna and ensembl_transcript_id ?
 
On Wed, Jul 29, 2009 at 12:01 AM, <mauede@alice.it> wrote:

> I downloaded the following file from miRDB
> http://mirdb.org/miRDB/download/MirTarget2_v3.0_prediction_result.txt.gz
>
> I have checked that miRDB Gene_Bank_Accession_Number (for Human it is
> something like NM_xxxxx) correspond to BioMart "refseq_dna".
>
> I have a vector containing 253  Gene_Bank_Accession_Numbers
>  length(tmp_miRNA_GB)
> [1] 253
> > tmp_miRNA_GB[1:5]
> [1] "NM_203390"    "NM_024639"    "NM_001017989" "NM_203331"    "NM_001879"
>
> I use such a vectos as input filter to getBM to obtain the respective
> ensembl_transcript_id.
> Surprisingly onlly 246 ensembl_transcript_ids are found:
>
> > gene.map <- getBM (attributes =
> c("hgnc_symbol","ensembl_gene_id","refseq_dna","ensembl_transcript_id"),
>                                       filters = "refseq_dna", values =
> tmp_miRNA_GB, mart=hmart)
>
> > dim(gene.map)
> [1] 246   4
>
> I thought there would be a 1-1 correspondence between the two attributes:
> "refseq_dna" and "ensembl_transcript_id"
> Am I mistaken ?
>

Hi, Maura.

Yes, unfortunately, there is not a 1-1 correspondence.  Ensembl and NCBI
(the curator of RefSeq) are independent organizations, each with different
build policies and annotation processes for transcripts.  So, in general in
this field (genomics/bioinformatics), there is RARELY a 1-1 correspondence
between any two entities.  I would suggest that 246/253 is actually quite a
good result--I might have expected a bit less a priori.

Sean




tutti i telefonini TIM!


	[[alternative HTML version deleted]]

