[BioC] I: Help with symbol names mapping between miRecords and BioMart
mauede at alice.it
mauede at alice.it
Fri Jul 3 12:57:00 CEST 2009
Please, find attached the crude script that worked.
Regards,
Maura
-----Messaggio originale-----
Da: mauede at alice.it
Inviato: gio 02/07/2009 8.11
A: Miichael Watson; Sean Davis; Steve Lianoglou
Oggetto: Help with symbol names mapping between miRecords and BioMart
I extracted some VALIDATED miRNAs and *hopefully* I paired them with their respective VALIDATED genes 3utr sequence.
I am NOT sure about my mapping between BioMart and miRecords objects name.
Clearly the output of my algorithm depends upon the correct (is it ?) names mapping.
> [miR-130a]
[1] "TAAACTACCTAACATTATTTATTCAGCTTCATTTGTGTCAATGGGCAATGACAGGTAAATTAAGACATGCACTATGAGGAATAATTATTTATTTAATAACAATTGTTTGGGGTTGAAAATTCAAAAAGTGTTTATTTTTCATATTGTGCCAATATGTATTGTAAACATGTGTTTTAATTCCAATATGATGACTCCCTTAAAATAGAAATAAGTGGTTATTTCTCAACAAAGCACAGTGTTAAATGAAATTGTAAAACCTGTCAATGATACAGTCCCTAAAGAAAAAAAATCATTGCTTTGAAGCAGTTGTGTCAGCTACTGCGGAAAAGGAAGGAAACTCCTGACAGTCTTGTGCTTTTCCTATTTGTTTTCATGGTGAAAATGTACTGAGATTTTGGTATTACACTGTATTTGTATCTCTGAAGCATGTTTCATGTTTTGTGACTATATAGAGATGTTTTTAAAAGTTTCAATGTGATTCTAATGTCTTCATTTCATTGTATGATGTGTTGTGATAGCTAACATTTT"
> hsa-let-7c
[1] "ATTGTCATTGGAGGAGTCCAGGATAGCTCTTCATGTTATTTTCACCTTGAGGAATTGTCCATTACATCTATGAGCCTTATGTGTGGCTTTCTCCGATATAGAAACCTATCAGGTGTCTTTTAGATCATTTCAAAACACTGGCTTATTCTTTCTTATGTTTCCAACTGAAGTCTGCATCCCAAGATGTAGTTTCACTGCTACCCCATATGGCACCCTTGTACGAATTTGAAAAAAGTACTCACTCTAGGCACATGCAGAGCCATGCCTGCGGGGACAGCTTAGAGAGTAGAGGGTGGGCTGAACTCCAGTTACTCTCGTACAGGGATCCACCTTTTTGCAGAAATCACAGTGTGGCTATGGTGTGGTTTGATTTCATAAAACAGATGCTT"
[2] "TTGCATTTCCTAGGTTTCTGTGTTTGGGGTGTGTGTGCGTGTCTCTCTCTCTCTCTCTCTCTTTCTCTTTCTCTCTCTTTTTGAATTTCAAAGAAGAAACAGTCTCAGGGAAATTTCTTTTTTCTTTTTTTTTTTTAAAGAGAACAAGAAAAGTACAACATTGCTTAAGTCCTACCTCATCTTTATTTTTTTACAGATGAATGTACTTATCTTTTCTGCAGGGATTGAGCCTGTGAAGTGATAATTTCTATCTACCTCATAAATCTTTACATTTCCTTCTGCAACAGGCCCTCTTCCCCTCCTCAGTGGAGTTTGCATTTCCCTCTTCCCCTGCGTGGGGCATGATATGCACAAGCCTGGCATCTGTATGGCTGGGAGGGCACTGGATGTGTGTGGTGGGGTGTATTCTGTAGATTGAGCCAAGGAAACACAAAAAAAAACTACTAAGT"
[3] "Sequence unavailable"
[4] "GCCACCCACCTTGGCCTCTCAAAGTGCTGGGAATACAGGCGTGAGCCATCGTGCCTGGTCTAAAAAATGTCTATTAGTGTTCCATCACTAGATCTCTTCTGAGGTATTCATGCCATATGCCCCATCCTGATGTCATATCCACAGGACAATCTACTACCAAGAACCAGCTCCAAGAAGAAAACATCTCTGGGAAACAGTACCAAAAGGAGTCACTGAATTGTCATTGGAGGAGTCCAGGATAGCTCTTCATGTTATTTTCACCTTGAGGAATTGTCCATTACATCTATGAGCCTTATGTGTGGCTTTCTCCGATATAGAAACCTATCAGGTGTCTTTTAGATCATTTCAAAACACTGGCTTATTCTTTCTTATGTTTCCAACTGAAGTCTGCATCCCAAGATGTAGTTTCACTGCTACCCCATATGGCACCCTTGTACGAATTTGAAAAAAGTACTCACTCTAGGCACATGCAGAGCCATGCCTGCGGGGACAGCTTAGAGAGTAGAGGGTGGGCTGAACTCCAGTTACTCTCG"
[5] "GGGGCGCCAACGTTCGATTTCTACCTCAGCAGCAGTTGGATCTTTTGAAGGGAGAAGACACTGCAGTGACCACTTATTCTGTATTGCCATGGTCTTTCCACTTTCATCTGGGGTGGGGTGGGGTGGGGTGGGGGAGGGGGGGGTGGGGTGGGGAGAAATCACATAACCTTAAAAAGGACTATATTAATCACCTTCTTTGTAATCCCTTCACAGTCCCAGGTTTAGTGAAAAACTGCTGTAAACACAGGGGACACAGCTTAACAATGCAACTTTTAATTACTGTTTTCTTTTTTCTTAACCTACTAATAGTTTGTTGATCTGATAAGCAAGAGTGGGCGGGTGAGAAAAACCGAATTGGGTTTAGTCAATCACTGCACTGCATGCAAACAAGAAACGTGTCACACTTGTGACGTCGGGCATTCATATAGGAAGAACGCGGTGTGTAACACTGTGTACACCTCAAATACCACCCCAACCCACTCCCTGTAGTGAATCCTCTGTTTAGAACACCAAAGATAAGGACTAGATACTACTTTCTCTTTTTCGTATAATCTTGTAGACACTTACTTGATGATTTTTAACTTTTTATTTCTAAATGAGACGAAATGCTGATGTATCCTTTCATTCAGCTAACAAACTAGAAAAGGTTATGTTCATTTTTCAAAAAGGGAAGTAAGCAAACAAATATTGCCAACTCTTCTATTTATGGATATCACACATATCAGCAGGAGTAATAAATTTACTCACAGCACTTGTTTTCAGGACAACACTTCATTTTCAGGAAATCTACTTCCTACAGAGCCAAAATGCCATTTAGCAATAAATAACACTTGTCAGCCTCAGAGCATTTAAGGAAACTAGACAAGTAAAATTATCCTCTTTGTAATTTAATGAAAAGGTACAACAGAATAATGCATGATGAACTCACCTAATTATGAGGTGGGAGGAGCGAAATCTAAATTTCTTTTGCTATAGTTATACATCAATTTAAAAAGCAAAAAAAAAAAAGGGGGGGGCAATCTCTCTCTGTGTCTTTCTCTCTCTCTCTTCCTCTCCCTCTCTCTTTTCATTGTGTATCAGTTTCCATGAAAGACCTGAATACCACTTACCTCAAATTAAGCATATGTGTTACTTCAAGTAATACGTTTTGACATAAGATGGTTGACCAAGGTGCTTTTCTTCGGCTTGAGTTCACCATCTCTTCATTCAAACTGCACTTTTAGCCAGAGATGCAATATATCCCCACTACTCAATACTACCTCTGAATGTTACAACGAATTTACAGTCTAGTACTTATTACATGCTGCTATACACAAGCAATGCAAGAAAAAAACTTACTGGGTAGGTGATTCTAATCATCTGCAGTTCTTTTTGTACACTTAATTACAGTTAAAGAAGCAATCTCCTTACTGTGTTTCAGCATGACTATGTATTTTTCTATGTTTTTTTAATTAAAAATTTTTAAAATACTTGTTTCAGCTTCTCTGCTAGATTTCTACATTAACTTGAAAATTTTTTAACCAAGTCGCTCCTAGGTTCTTAAGGATAATTTTCCTCAATCACACTACACATCACACAAGATTTGACTGTAATATTTAAATATTACCCTCCAAGTCTGTACCTCAAATGAATTCTTTAAGGAGATGGACTAATTGACTTGCAAAGACCTACCTCCAGACTTCAAAAGGAATGAACTTGTTACTTGCAGCATTCATTTGTTTTTTCAATGTTTGAAATAGTTCAAACTGCAGCTAACCCTAGTCAAAACTATTTTTGTAAAAGACATTTGATAGAAAGGAACACGTTTTTACATACTTTTGCAAAATAAGTAAATAATAAATAAAATAAAAGCCAACCTTCAAAGAAACTTGAAGCTTTGTAGGTGAGATGCAACAAGCCCTGCTTTTGCATAATGCAATCAAAAATATGTGTTTTTAAGATTAGTTGAATATAAGAAAATGCTTGACAAATATTTTCATGTATTTTACACAAATGTGATTTTTGTAATATGTCTCAACCAGATTTATTTTAAACGCTTCTTATGTAGAGTTTTTATGCCTTTCTCTCCTAGTGAGTGTGCTGACTTTTTAACATGGTATTATCAACTGGGCCAGGAGGTAGTTTCTCATGACGGCTTTTGTCAGTATGGCTTTTAGTACTGAAGCCAAATGAAACTCAAAACCATCTCTCTTCCAGCTGCTTCAGGGAGGTAGTTTCAAAGGCCACATACCTCTCTGAGACTGGCAGATCGCTCACTGTTGTGAATCACCAAAGGAGCTATGGAGAGAATTAAAACTCAACATTACTGTTAACTGTGCGTTAAATAAGCAAATAAACAGTGGCTCATAAAAATAAAAGTCGCATTCCATATCTTTGGATGGGCCTTTTAGAAACCTCATTGGCCAGCTCATAAAATGGAAGCAATTGCTCATGTTGGCCAAACATGGTGCACCGAGTGATTTCCATCTCTGGTAAAGTTACACTTTTATTTCCTGTATGTTGTACAATCAAAACACACTACTACCTCTTAAGTCCCAGTATACCTCATTTTTCATACTGAAAAAAAAAGCTTGTGGCCAATGGAACAGTAAGAACATCATAAAATTTTTATATATATAGTTTATTTTTGTGGGAGATAAATTTTATAGGACTGTTCTTTGCTGTTGTTGGTCGCAGCTACATAAGACTGGACATTTAACTTTTCTACCATTTCTGCAAGTTAGGTATGTTTGCAGGAGAAAAGTATCAAGACGTTTAACTGCAGTTGACTTTCTCCCTGTTCCTTTGAGTGTCTTCTAACTTTATTCTTTGTTCTTTATGTAGAATTGCTGTCTATGATTGTACTTTGAATCGCTTGCTTGTTGAAAATATTTCTCTAGTGTATTATCACTGTCTGTTCTGCACAATAAACATAACAGCCTCTGTGATCCCCATGTGTTTTGATTCCTGCTCTTTGTTACAGTTCCATTAAATGAGTAATAAAGTTTGGTCAAAAC"
I downloaded the VALIDATED xls file from miRecords and discarded the records that do not pertain to Homo Sapiens. I also
dropped some columns that, as far as I can tell, do not carry data relevant for my goal.
Then throug BioMart functions I extracted the following fields: 'hgnc_symbol','ensembl_gene_id','external_gene_id','refseq_dna'.
The filters I used assume that
BioMart data named "hgnc_automatic_gene_name" is what miRecords calls "Target.gene_name" and
BioMart data named "refseq_dna" is what miRecords calls "Target.gene_Refseq_acc"
Please, check my objects name mapping and let me know if I go it right / wrong.
There are a few cases which are not dealt with by my algorithm yet.
That is, some records in miRecords xls file contain non standard miRNA identifier that I do not understand. For instance:
"hsa-miR-15a/hsa-miR-16" (what does the "/" mean ?)
"[miR-106b]" (what do the square brakets mean ?)
Moreover, there are many redundant lines in my script. It is just a starting point.
It maybe possible to get the 3UTR sequences of VALIDATED targets downloading the conjoined information from miRecords and miRDB ...
but it must be harder because the miRecords organization does not provide any interface library.
I have attached the pruned version of miRecords xls file and my crude script.
I look forward to your feedback.
Thank you a lot,
Maura
e tutti i telefonini TIM!
Vai su
e tutti i telefonini TIM!
Vai su
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: My_miRec_Validated_Targets.txt
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20090703/b9355dc8/attachment.txt>
More information about the Bioconductor
mailing list