[BioC] Retrieving Upstream Sequences With biomaRt
Steffen Durinck
durincks at mail.nih.gov
Mon Feb 26 15:08:50 CET 2007
Hi Peter,
As many of you noticed recently the implementation of the getSequence
function in MySQL mode and in the default webservice mode is different.
In MySQL mode you can only retrieve sequences based on chromosomal
coordinates so this should enable you to retrieve upstream sequences if
you have the exact positions of the sequences you want.
In webservice mode there are more options however upstream sequences are
currently not yet available. One can retrieve 5'utr, 3'utr, protein and
cdna sequence based on a set of identifiers or if a chromosomal location
is given then e.g. any annotated 5'utr between these positions will be
returned.
In webservice mode the getSequence function should actually be able to
retrieve more types of sequences such as exons only, or upstream regions
but this requires some more development which I will try to get working
as soon as possible now that there is a clear interest in the
getSequence function of biomaRt.
best,
Steffen
Peter Robinson wrote:
> On Tue, Feb 20, 2007 at 12:11:02PM -0000, Stephen Henderson wrote:
>
>> The first code that you show doesn't work--for many reasons.
>>
>>
>>> library(biomaRt)
>>> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE)
>>> entrez <- c("100","330")
>>> gene <- getGene(id=entrez, type="entrezgene", mart=ens)
>>> getSequence(chromosome = gene$chromosome, start = gene$start - 2000,
>>>
>> end= >gene$end + 1000, mart=ens)
>>
>>
>> 1. gene$start and gene$end don't exist.
>> 2. seqType is not specified.
>> 3. seqType can't be specified as the type you want.
>>
>>
>>
>>> However, when I try the following line:
>>> gene <- getGene(id=entrez, type="entrezgene", mart=ens)
>>>
>> that appears to be a problem with your database installation as if you
>> do it over the web (i.e. drop mysql=TRUE):
>>
>> ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl")
>> entrez <- c("100","330")
>> gene <- getGene(id=entrez, type="entrezgene", mart=ens)
>>
>> it works fine.
>>
>>
>>> The examples in the vignette all seem to work. What is wrong here?
>>>
>> Where is >the file mentioned in the error message supposed to live?
>>
>>
>
> Hmm..., I take it that the API of biomaRt has changed quite a bit. I got the code from a previous message to this list (I think), but perhaps it is easier to ask how to do things properly than to ask how to fix the code. Is there a way of retrieving upstream sequences with biomaRt?
>
> Thanks, Peter
>
>
>
>> Stephen Henderson
>> Wolfson Inst. for Biomedical Research
>> Cruciform Bldg., Gower Street
>> University College London
>> United Kingdom, WC1E 6BT
>> +44 (0)207 679 6827
>>
>> **********************************************************************
>> This email and any files transmitted with it are confidentia...{{dropped}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Steffen Durinck, Ph.D.
Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/
Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877
More information about the Bioconductor
mailing list