[BioC] boiomaRt 'getSequence' question
Steffen Durinck
SDurinck at lbl.gov
Fri Feb 8 01:08:20 CET 2008
Hi Galina,
Yes this is possible, however you can only retrieve one sequence at a time this way and you'll need RMySQL installed.
Here's how you do this:
library(biomaRt)
ensembl = useMart("ensembl", dataset="hsapiens_gene_ensembl", mysql=TRUE)
getSequence(chromosome = 10, start=200000, end = 200010, mart = ensembl)
you'll get:
chromosome start end sequence
1 10 2e+05 200010 TGTGTTCCCCT
Cheers,
Steffen
----- Original Message -----
From: "Glazko, Galina" <Galina_Glazko at URMC.Rochester.edu>
Date: Thursday, February 7, 2008 4:02 pm
Subject: RE: [BioC] boiomaRt 'getSequence' question
To: Steffen Durinck <SDurinck at lbl.gov>
> Steffen,
>
> thank you very much!
> But, I also have chromosomal coordinates.
> Is it possible instead of gene ID just indicate the coordinates,
> chromosome number, and then retrieve the entire sequence? Is there
> 'seqType' appropriate for this?
> thank you!
>
> best regrads
> Galina
>
>
> ________________________________
>
> From: Steffen Durinck [mailto:SDurinck at lbl.gov]
> Sent: Thu 2/7/2008 5:57 PM
> To: Glazko, Galina
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] boiomaRt 'getSequence' question
>
>
>
> Hi Galina,
>
> With biomaRt you can currently only specify either an upstream or
> downstream flank in one query. So you'll need at least two queries
> to do this. If you do ?getSequence, the help page will tell you
> that seqType "gene_exon_intron' gives the exons + introns of a
> gene. Note that if you retrieve seqType gene_exon_intron, you are
> already retrieving the 5' and 3' UTRs flanking the coding region.
> If you also want to include the promotor region in this query you
> could set upstream=4000. If you need sequences downstream the
> transcribed region, you'll have to do a second query and match up
> both query results.
>
> Cheers,
> Steffen
>
> ----- Original Message -----
> From: "Glazko, Galina" <Galina_Glazko at urmc.rochester.edu>
> Date: Thursday, February 7, 2008 12:48 pm
> Subject: [BioC] boiomaRt 'getSequence' question
> To: bioconductor at stat.math.ethz.ch
>
> > Dear all,
> >
> >
> >
> > I have a list of ensemble gene IDs and I need to get corresponding
> > sequences together with 5' upstream (4000 bp), 3'downstream (4000
> bp)> and all introns.
> >
> > I know that I probably can do this using a combination of commands:
> >
> >
> >
> > Tmp1<-getSequence(id=
> >
> "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank",ups> tream=4000,mart=human)
> >
> > Tmp2<-getSequence(id=
> >
> "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank",dow> nstream=4000,mart=human)
> >
> > Tmp3<- getSequence(id=
> > "ENSG00000128714",type="ensembl_gene_id",seqType="cdna", mart=human)
> >
> >
> >
> > and then concatenate tmp1, tmp2, tmp3, but I am not sure that 'cdna'
> > seqType will give me introns...
> >
> > Also, I hope that there is a simpler way to get all these sequences
> > using just one command with the right 'seqType' specification.
> >
> >
> >
> > Could someone please clarify this for me?
> >
> > Thank you!
> >
> >
> >
> > Best regards
> >
> > Galina
> >
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
More information about the Bioconductor
mailing list