[BioC] getSequence ensmebl biomaRt

James W. MacDonald jmacdon at med.umich.edu
Fri Aug 14 15:41:51 CEST 2009


If you simply want to get the DNA sequence, then you should use the 
BSgenome.Drerio.UCSC.danRer5 package:

 > suppressMessages(library(BSgenome.Drerio.UCSC.danRer5))
 > subseq(Drerio$chr15, 18357968,18360987)
   3020-letter "MaskedDNAString" instance (# for masking)
seq: 
CATATATCTTAAGCAGAGTGCACTGGACAGATCAGA...TAAAGGTTTTTTTCCCTGGTGACCTTCCACACCAAA
masks:
   maskedwidth maskedratio active names                               desc
1           0   0.0000000   TRUE AGAPS                      assembly gaps
2           0   0.0000000   TRUE   AMB           intra-contig ambiguities
3        1446   0.4788079  FALSE    RM                       RepeatMasker
4           0   0.0000000  FALSE   TRF Tandem Repeats Finder [period<=12]
all masks together:
   maskedwidth maskedratio
          1446   0.4788079
all active masks together:
   maskedwidth maskedratio
             0           0

You can convert to a string (small range converted here):

 > toString(subseq(Drerio$chr15, 18357968,18358000))
[1] "CATATATCTTAAGCAGAGTGCACTGGACAGATC"
 >

Best,

Jim

Mayra Eduardoff wrote:
> hi james,
> thanks, I know .... my question is how to get a genomic dna sequence 
> (where there maybe is no gene) ???
> any ideas ?
> kind regards
> mayra
> 
> On Thu, Aug 13, 2009 at 2:53 PM, James W. MacDonald 
> <jmacdon at med.umich.edu <mailto:jmacdon at med.umich.edu>> wrote:
> 
>     Hi Mayra,
> 
> 
>     Mayra Eduardoff wrote:
> 
>         Hi Steffen
> 
> 
>         I want to retrieve a genomic sequence with biomaRt:
> 
> 
>         Session(info)
>         R version 2.9.1 (2009-06-26)
>         i386-pc-mingw32
> 
>         locale:
>         LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
> 
>         attached base packages:
>         [1] stats     graphics  grDevices utils     datasets  methods   base
> 
>         other attached packages:
>          [1] BSgenome_1.12.3         cureos_0.3            
>          Biostrings_2.12.8
>         IRanges_1.2.3           zfv2.db_1.0.0           RSQLite_0.7-1
>          [7] DBI_0.2-4               Agi4x44PreProcess_1.4.0
>         genefilter_1.24.2
>         annotate_1.22.0         AnnotationDbi_1.6.1     venn_1.5
>         [13] multtest_2.1.1          vsn_3.12.0              Biobase_2.5.5
>         xtable_1.5-5            limma_2.18.2            biomaRt_2.0.0
> 
> 
> 
>             mart <- useMart("ensembl")
>             mart <- useDataset(mart=mart, "drerio_gene_ensembl")
> 
> 
>         seq <- getSequence(chromosome = 15, start = 18357968, end =
>         18360987, mart =
>         mart)
> 
>         Fehler in getSequence(chromosome = 15, start = 18357968, end =
>         18360987,  :
>          Please specify the type of sequence that needs to be retrieved
>         when using
>         biomaRt in web service mode.  Choose either gene_exon,
>         transcript_exon,transcript_exon_intron, gene_exon_intron, cdna,
>         coding,coding_transcript_flank,coding_gene_flank,transcript_flank,gene_flank,peptide,
>         3utr or 5utr
> 
>         Apart from the fact that I want a genomic region even if I
>         specify type it
>         doesn t seem to work :
> 
>         seq <- getSequence(chromosome = 15, start = 18357968, end =
>         18360987,
>         type="gene_exon", mart = mart)
>         Fehler in getSequence(chromosome = 15, start = 18357968, end =
>         18360987,  :
>          Please specify the type of sequence that needs to be retrieved
>         when using
>         biomaRt in web service mode.  Choose either gene_exon,
>         transcript_exon,transcript_exon_intron, gene_exon_intron, cdna,
>         coding,coding_transcript_flank,coding_gene_flank,transcript_flank,gene_flank,peptide,
>         3utr or 5utr
> 
> 
>         or  as in documentation (although this doesn t make any sense to
>         me to
>         specify seqType and type...)
> 
> 
>     You have to specify seqType and type because the sequences don't
>     come back in the same order you requested, so the type argument is
>     used to label the sequences.
> 
>     Also, I don't see any way to get inter-genic sequences. For instance:
> 
>      > getSequence(15,18357968,18360987,seqType="cdna", mart=mart,
>     type="ensembl_transcript_id")
>     [1] cdna                  ensembl_transcript_id
>     <0 rows> (or 0-length row.names)
> 
>     Because this portion of the zebrafish genome contains no known
>     genes. However, if I pick a region that does contain a gene:
> 
>      > getSequence(15,18723006,18741517,seqType="cdna", mart=mart,
>     type="ensembl_transcript_id")
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>                                                                        cdna
>     1
>     AGGAGCCGCTCAGACCACACCAGTGCAGGGTCAGAACCTGGTGACAAATAATGTCTCAGTGGTGGAGGGCGAGACGGCCATCATCAGCTGCCGGGTGAAAAACAACGACGACTCCGTCATCCAACTGCTCAACCCCAACCGGCAGACTATCTACTTCAGAGACGTTAGACCTTTGAAGGACAGTCGGTTTCAGCTGGTAAACTTCTCCGACAACGAGCTCTTGGTGTCCCTGTCCAACGTGTCTCTGTCGGACGAGGGCCGCTACGTGTGTCAACTCTACACGGATCCACCGCAAGAAGCCTACGCCGACATCACTGTACTGGTTCCACCAGGCAACCCAATCTTAGAGTCCCGCGAGGAAATCGTGAGCGAGGGGAATGAGACCGAGATAACCTGCACCGCCATGGGCAGCAAACCTGCTTCCACCATCAAATGGATGAAAGGCGACCAACCACTGCAAGGTGAGGCGACTGTGGAGGAGTTATACGACAGGATGTTCACTGTCACCAGCCGGCTCAGGCTCACCGTCTCTAAGGAGGACGATGGAGTGGCCGTCATCTGCATCATTGACCATCCAGCCGTGAAGGACTTCCAGGCCCAGAAATACCTGGAAGTGCAGTATAAACCAGAAGTGAAGATTGTGGTGGGATTCCCAGAGGGTTTGACCAGAGAAGGAGAAAATCTCGAGCTGACATGCAAAGCTAAAGGAAAACCGCAGCCTCATCAAATTAACTGGCTCAAAGTGGATGATGATTTCCCCTCCCACGCCTTGGTAACTGGCTCTGATCTCTTCATCGAAAACCTTAACAAGTCCTACAACGGAACGTACCGCTGTGTGGCATCTAACTTAGTGGGAGAAGCCTACGATGATTACATCCTTTATGTATACGATTCAAGAGCAGATGGAGCGCCACAGAAAATTGATCATGCCGTCATCGGCGGAGTTGTCGCAGTGGTTGTGTTCGCCATGCTTTGTCTCCTGA
TTGTTC
>     TTGGCCGATATTTCGCCAGACACAAAGGGACCTACTTCACCCACGAAGCTAAAGGAGCGGATGACGCGGCGGACGCCGACACTGCCATCATCAACGCAGAGGGCGGACACAACAATTCGGATGACAAGAAGGAATACTACATTTAA
>      ensembl_transcript_id
>     1    ENSDART00000062603
> 
>     Best,
> 
>     Jim
> 
> 
> 
> 
> 
>         seq <- getSequence(chromosome = 15, start = 18357968, end =
>         18360987,
>         type="entrez", seqType="cdna", mart = mart)
>         Fehler in getBM(c(seqType, type), filters = c("chromosome_name",
>         "start",  :
> 
>         Invalid attribute(s): entrez
>         Please use the function 'listAttributes' to get valid attribute
>         names
> 
> 
> 
> 
>         I  can t load  in msyql mode either anymore :
>          mart <- useMart("ensembl", mysql=TRUE)
>         Fehler: mysql access to Ensembl is no longer available through
>         this package
>         the web service mode supports all queries.  If mysql is needed a
>         separate
>         package will become available with limited mysql query support.
> 
> 
>         I would be very greatful for you help !
> 
> 
>         kind regards,
> 
>         Mayra
> 
> 
>     -- 
>     James W. MacDonald, M.S.
>     Biostatistician
>     Douglas Lab
>     University of Michigan
>     Department of Human Genetics
>     5912 Buhl
>     1241 E. Catherine St.
>     Ann Arbor MI 48109-5618
>     734-615-7826
> 
> 
> 
> 
> -- 
> Mayra Eduardoff
> Institute of Molecular Biology
> University of Innsbruck
> Viktor-Franz Hess Haus
> Technikerstrasse 25
> 6020 Innsbruck
> Tel: +43 512 507 6286
> email: mayra.eduardoff at student.uibk.ac.at 
> <mailto:mayra.eduardoff at student.uibk.ac.at>
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826



More information about the Bioconductor mailing list