[BioC] retrieving upstream/intronic sequences using biomaRt

Robert Gentleman rgentlem at fhcrc.org
Tue Sep 19 20:38:20 CEST 2006


Depending on what you know (start positions etc) - Biostrings is a 
reasonable tool for extracting these in an efficient manner - a number 
of genomes are available now, and more can be added

Steffen Durinck wrote:
> No, I don't think there is a package to find motifs in the current 
> repository. 
> It would be nice to have one.
> 
> best,
> Steffen
> 
> Henrik Hornshøj Jensen wrote:
>> Thanks for your help, although I was thinking of a bioconductor package.
>>
>> Regards,
>> Henrik
>>  
>>
>>
>> -----Oprindelig meddelelse-----
>> Fra: Krys Kelly [mailto:kak28 at cam.ac.uk] 
>> Sendt: Tuesday, September 19, 2006 12:35 PM
>> Til: Henrik Hornshøj Jensen; bioconductor at stat.math.ethz.ch
>> Emne: RE: [BioC] retrieving upstream/intronic sequences using biomaRt
>>
>> Hi Henrik,
>>
>> A package?  The more one looks, the more one finds!  The attached spreadsheet is very much a work in progress and a bit messy and incomplete.
>> We started it after a very quick review of the literature, so it is also far from comprehensive.  However, it will probably give you more than enough information to get started. 
>>
>> This review should be helpful:
>>
>> Tompa et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23(1) 137-144. 
>>
>> The three 'old-timer' programs that everyone seems to use are AlignACE, Meme and Consensus. And we have also been using Weeder, Sombrero and NestedMica.
>> Be aware that some of the programs (e.g. AlignACE) can give quite different answers on different runs even with the same parameters. And the different programs can give very different answers. I am aware that a number of people (including ourselves) use several of the programs and take the motifs that turn up in most of the programs for further study.
>>
>> There are also programs that search for known motifs (e.g. MAST (companion to MEME), MSCAN, SiteSeer). Two well-known databases of Transcription Factor Binding Sites are TRANSFAC and JASPAR.
>>
>> Hope this helps.
>>
>> Krys
>>
>>
>> Dr Krystyna A Kelly
>> University of Cambridge
>> Department of Pathology
>> Molteno Building, Tennis Court Road
>> Cambridge CB2 1QP
>> Tel:    01223 333331
>> Email: kak28 at cam.ac.uk
>>  
>>
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Henrik Hornshøj Jensen
>> Sent: 19 September 2006 10:33
>> To: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] retrieving upstream/intronic sequences using biomaRt
>>
>> Any of you guys know a package that will predict regulatory sites in upstream regions?
>>
>> Regards,
>> Henrik
>>  
>>
>>
>> -----Oprindelig meddelelse-----
>> Fra: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] På vegne af Steffen Durinck
>> Sendt: Wednesday, September 13, 2006 2:25 PM
>> Til: Shamit Soneji
>> Cc: BioC
>> Emne: Re: [BioC] retrieving upstream/intronic sequences using biomaRt
>>
>> Hi Shamit,
>>
>> Yes, with biomaRt you can get the upstream sequences but currently not the intronic sequences.
>>  Try:
>>
>> library(biomaRt)
>> ensmart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>> getSequence( id="ENSG00000139618", type="ensembl",mart = ensmart, seqType =
>> "5utr")
>>
>> Cheers,
>> Steffen
>>
>>
>> Shamit Soneji wrote:
>>   
>>> Is it possible using biomaRt (or any other R/BioC means) to download 
>>> the upstream and intron sequences for any given ensembl ID?
>>>
>>> I know this can be done just using straight biomart, but a facility 
>>> like this from R would be very useful if one wants to search for TF 
>>> binding sites.
>>>
>>> Many thanks
>>>
>>> Shamit
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>   
>>>     
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list