[BioC] Retrieving Upstream Sequences With biomaRt

Peter Robinson Peter.Robinson at t-online.de
Tue Feb 27 07:11:27 CET 2007


On Mon, Feb 26, 2007 at 09:08:50AM -0500, Steffen Durinck wrote:
> Hi Peter,
> 
> As many of you noticed recently the implementation of the getSequence 
> function in MySQL mode and in the default webservice mode is different.
> In MySQL mode you can only retrieve sequences based on chromosomal 
> coordinates so this should enable you to retrieve upstream sequences if 
> you have the exact positions of the sequences you want.
> In webservice mode there are more options however upstream sequences are 
> currently not yet available.  One can retrieve 5'utr, 3'utr, protein and 
> cdna sequence based on a set of identifiers or if a chromosomal location 
> is given then e.g. any annotated 5'utr between these positions will be 
> returned.
> In webservice mode the getSequence function should actually be able to 
> retrieve more types of sequences such as exons only, or upstream regions 
> but this requires some more development which I will try to get working 
> as soon as possible now that there is a clear interest in the 
> getSequence function of biomaRt.
> 
> best,
> Steffen


Steffen, thanks for your work on this! best, Peter


> 
> 
> Peter Robinson wrote:
> > On Tue, Feb 20, 2007 at 12:11:02PM -0000, Stephen Henderson wrote:
> >   
> >> The first code that you show doesn't work--for many reasons.
> >>
> >>     
> >>> library(biomaRt)
> >>> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE)
> >>> entrez <- c("100","330")
> >>> gene <- getGene(id=entrez, type="entrezgene", mart=ens)
> >>> getSequence(chromosome = gene$chromosome, start = gene$start - 2000,
> >>>       
> >> end= >gene$end + 1000, mart=ens)
> >>
> >>
> >> 1. gene$start and gene$end don't exist.
> >> 2. seqType is not specified.
> >> 3. seqType can't be specified as the type you want.
> >>
> >>
> >>     
> >>> However, when I try the following line:
> >>> gene <- getGene(id=entrez, type="entrezgene", mart=ens)
> >>>       
> >> that appears to be a problem with your database installation as if you
> >> do it over the web (i.e. drop mysql=TRUE):
> >>
> >> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl")
> >> entrez <- c("100","330")
> >> gene <- getGene(id=entrez, type="entrezgene", mart=ens)
> >>
> >> it works fine.
> >>
> >>     
> >>> The examples in the vignette all seem to work. What is wrong here?
> >>>       
> >> Where is >the file mentioned in the error message supposed to live? 
> >>
> >>     
> >
> > Hmm..., I take it that the API of biomaRt has changed quite a bit. I got the code from a previous message to this list (I think), but perhaps it is easier to ask how to do things properly than to ask how to fix the code. Is there a way of retrieving upstream sequences with biomaRt?
> >
> > Thanks, Peter
> >
> >
> >   
> >> Stephen Henderson
> >> Wolfson Inst. for Biomedical Research
> >> Cruciform Bldg., Gower Street
> >> University College London
> >> United Kingdom, WC1E 6BT
> >> +44 (0)207 679 6827
> >>
> >> **********************************************************************
> >> This email and any files transmitted with it are confidentia...{{dropped}}
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>     
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >   
> 
> 
> -- 
> Steffen Durinck, Ph.D.
> 
> Oncogenomics Section
> Pediatric Oncology Branch
> National Cancer Institute, National Institutes of Health
> URL: http://home.ccr.cancer.gov/oncology/oncogenomics/
> 
> Phone: 301-402-8103
> Address:
> Advanced Technology Center,
> 8717 Grovemont Circle
> Gaithersburg, MD 20877
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list