[BioC] boiomaRt 'getSequence' question

Steffen Durinck SDurinck at lbl.gov
Fri Feb 8 01:08:20 CET 2008


Hi Galina,

Yes this is possible, however you can only retrieve one sequence at a time this way and you'll need RMySQL installed.
Here's how you do this:

library(biomaRt)
ensembl = useMart("ensembl", dataset="hsapiens_gene_ensembl", mysql=TRUE)
getSequence(chromosome = 10, start=200000, end = 200010, mart = ensembl)

you'll get:

  chromosome start    end    sequence
1         10 2e+05 200010 TGTGTTCCCCT

Cheers,
Steffen



----- Original Message -----
From: "Glazko, Galina" <Galina_Glazko at URMC.Rochester.edu>
Date: Thursday, February 7, 2008 4:02 pm
Subject: RE: [BioC] boiomaRt 'getSequence' question
To: Steffen Durinck <SDurinck at lbl.gov>

> Steffen,
> 
> thank you very much! 
> But, I also have chromosomal coordinates.
> Is it possible instead of gene ID just indicate the coordinates, 
> chromosome number, and then retrieve the entire sequence? Is there 
> 'seqType' appropriate for this?
> thank you!
> 
> best regrads
> Galina
> 
> 
> ________________________________
> 
> From: Steffen Durinck [mailto:SDurinck at lbl.gov]
> Sent: Thu 2/7/2008 5:57 PM
> To: Glazko, Galina
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] boiomaRt 'getSequence' question
> 
> 
> 
> Hi Galina,
> 
> With biomaRt you can currently only specify either an upstream or 
> downstream flank in one query.  So you'll need at least two queries 
> to do this.  If you do ?getSequence, the help page will tell you 
> that seqType "gene_exon_intron' gives the exons + introns of a 
> gene.  Note that if you retrieve seqType gene_exon_intron, you are 
> already retrieving the 5' and 3' UTRs flanking the coding region.  
> If you also want to include the promotor region in this query you 
> could set upstream=4000.  If you need sequences downstream the 
> transcribed region, you'll have to do a second query and match up 
> both query results.
> 
> Cheers,
> Steffen
> 
> ----- Original Message -----
> From: "Glazko, Galina" <Galina_Glazko at urmc.rochester.edu>
> Date: Thursday, February 7, 2008 12:48 pm
> Subject: [BioC] boiomaRt 'getSequence' question
> To: bioconductor at stat.math.ethz.ch
> 
> > Dear all,
> >
> >
> >
> > I have a list of ensemble gene IDs and I need to get corresponding
> > sequences together with 5' upstream (4000 bp), 3'downstream (4000 
> bp)> and all introns.
> >
> > I know that I probably can do this using a combination of commands:
> >
> >
> >
> > Tmp1<-getSequence(id=
> > 
> "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank",ups> tream=4000,mart=human)
> >
> > Tmp2<-getSequence(id=
> > 
> "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank",dow> nstream=4000,mart=human)
> >
> > Tmp3<- getSequence(id=
> > "ENSG00000128714",type="ensembl_gene_id",seqType="cdna", mart=human)
> >
> >
> >
> > and then concatenate tmp1, tmp2, tmp3, but I am not sure that 'cdna'
> > seqType will give me introns...
> >
> > Also, I hope that there is a simpler way to get all these sequences
> > using just one command with the right 'seqType' specification.
> >
> >
> >
> > Could someone please clarify this for me?
> >
> > Thank you!
> >
> >
> >
> > Best regards
> >
> > Galina
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
>



More information about the Bioconductor mailing list