[BioC] retrieving genomic sequences with biomaRt

J.delasHeras at ed.ac.uk J.delasHeras at ed.ac.uk
Tue Oct 10 20:58:23 CEST 2006


Hi Steffen,

that is more or less what I tried, but I get the following error after 
the call to 'getSequence'

"Error in inherits(x, "factor") : argument "seqType" is missing, with 
no default"

The only thing I seem to do different is that I didn't indicate 
mysql=TRUE because I haven't installed the package yet... I will, 
'though, but I don't suppose that's the cause of the error?

Thanks for your help, Steffen!

Jose

Quoting Steffen Durinck <durincks at mail.nih.gov>:

> Hi Jose,
>
> This should be possible with biomaRt, I would recommend using it in
> MySQL mode as the web service sometimes hangs for sequences.
>
> Which type of identifiers do you have?
>
> As an example with Entrez Gene identifiers you could do:
>
> library(biomaRt)
> ensmart=useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE)
> entrez = c("100","330")
> gene = getGene(id=entrez, type="entrezgene", mart=ensmart)
> getSequence(chromosome = gene$chromosome, start = gene$start - 2000, end
> = gene$end + 1000, mart=ensmart)
>
> Note that you'll have to make it a little bit more complicated than this
> as you have to take into account the strand on which the genes are
> located.  The strand is also returned by the getGene function.
>
> best,
> Steffen
>
> J.delasHeras at ed.ac.uk wrote:
>> Hi,
>>
>> I am putting a little script together to look at promoter regions, 
>> search for CpG islands etc, given a list of gene ids.
>>
>> I thought I would be able to use biomaRt to retrieve the sequence of 
>> interest, but I just can't find how right now... either I'm too 
>> tired (possible) or I'm missing something and it's just not 
>> possible. In particular 'getSequence' looked promising, but no luck 
>> so far.
>>
>> What I want to do is:
>> 1) I have a gene id, from it I want to locate the chromosome 
>> location of the gene, the start of transcription.
>> 2) knowing the transcriptional start, I want to retrieve the genomic 
>> sequence around that area... say 2000bp upstream and 1000 downstream.
>>
>> Anybody knows if this is possible and/or could give me any pointers?
>>
>> Thank you!
>>
>> Jose
>>
>>
>
>
> -- 
> Steffen Durinck, Ph.D.
>
> Oncogenomics Section
> Pediatric Oncology Branch
> National Cancer Institute, National Institutes of Health
> URL: http://home.ccr.cancer.gov/oncology/oncogenomics/
>
> Phone: 301-402-8103
> Address:
> Advanced Technology Center,
> 8717 Grovemont Circle
> Gaithersburg, MD 20877
>



-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK



More information about the Bioconductor mailing list