[BioC] Retrieving Upstream Sequences With biomaRt

Steffen Durinck durincks at mail.nih.gov
Mon Feb 26 15:08:50 CET 2007


Hi Peter,

As many of you noticed recently the implementation of the getSequence 
function in MySQL mode and in the default webservice mode is different.
In MySQL mode you can only retrieve sequences based on chromosomal 
coordinates so this should enable you to retrieve upstream sequences if 
you have the exact positions of the sequences you want.
In webservice mode there are more options however upstream sequences are 
currently not yet available.  One can retrieve 5'utr, 3'utr, protein and 
cdna sequence based on a set of identifiers or if a chromosomal location 
is given then e.g. any annotated 5'utr between these positions will be 
returned.
In webservice mode the getSequence function should actually be able to 
retrieve more types of sequences such as exons only, or upstream regions 
but this requires some more development which I will try to get working 
as soon as possible now that there is a clear interest in the 
getSequence function of biomaRt.

best,
Steffen


Peter Robinson wrote:
> On Tue, Feb 20, 2007 at 12:11:02PM -0000, Stephen Henderson wrote:
>   
>> The first code that you show doesn't work--for many reasons.
>>
>>     
>>> library(biomaRt)
>>> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE)
>>> entrez <- c("100","330")
>>> gene <- getGene(id=entrez, type="entrezgene", mart=ens)
>>> getSequence(chromosome = gene$chromosome, start = gene$start - 2000,
>>>       
>> end= >gene$end + 1000, mart=ens)
>>
>>
>> 1. gene$start and gene$end don't exist.
>> 2. seqType is not specified.
>> 3. seqType can't be specified as the type you want.
>>
>>
>>     
>>> However, when I try the following line:
>>> gene <- getGene(id=entrez, type="entrezgene", mart=ens)
>>>       
>> that appears to be a problem with your database installation as if you
>> do it over the web (i.e. drop mysql=TRUE):
>>
>> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl")
>> entrez <- c("100","330")
>> gene <- getGene(id=entrez, type="entrezgene", mart=ens)
>>
>> it works fine.
>>
>>     
>>> The examples in the vignette all seem to work. What is wrong here?
>>>       
>> Where is >the file mentioned in the error message supposed to live? 
>>
>>     
>
> Hmm..., I take it that the API of biomaRt has changed quite a bit. I got the code from a previous message to this list (I think), but perhaps it is easier to ask how to do things properly than to ask how to fix the code. Is there a way of retrieving upstream sequences with biomaRt?
>
> Thanks, Peter
>
>
>   
>> Stephen Henderson
>> Wolfson Inst. for Biomedical Research
>> Cruciform Bldg., Gower Street
>> University College London
>> United Kingdom, WC1E 6BT
>> +44 (0)207 679 6827
>>
>> **********************************************************************
>> This email and any files transmitted with it are confidentia...{{dropped}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>     
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>   


-- 
Steffen Durinck, Ph.D.

Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/

Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877



More information about the Bioconductor mailing list