[BioC] retrieving upstream/intronic sequences using biomaRt
Robert Gentleman
rgentlem at fhcrc.org
Tue Sep 19 20:38:20 CEST 2006
Depending on what you know (start positions etc) - Biostrings is a
reasonable tool for extracting these in an efficient manner - a number
of genomes are available now, and more can be added
Steffen Durinck wrote:
> No, I don't think there is a package to find motifs in the current
> repository.
> It would be nice to have one.
>
> best,
> Steffen
>
> Henrik Hornshøj Jensen wrote:
>> Thanks for your help, although I was thinking of a bioconductor package.
>>
>> Regards,
>> Henrik
>>
>>
>>
>> -----Oprindelig meddelelse-----
>> Fra: Krys Kelly [mailto:kak28 at cam.ac.uk]
>> Sendt: Tuesday, September 19, 2006 12:35 PM
>> Til: Henrik Hornshøj Jensen; bioconductor at stat.math.ethz.ch
>> Emne: RE: [BioC] retrieving upstream/intronic sequences using biomaRt
>>
>> Hi Henrik,
>>
>> A package? The more one looks, the more one finds! The attached spreadsheet is very much a work in progress and a bit messy and incomplete.
>> We started it after a very quick review of the literature, so it is also far from comprehensive. However, it will probably give you more than enough information to get started.
>>
>> This review should be helpful:
>>
>> Tompa et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23(1) 137-144.
>>
>> The three 'old-timer' programs that everyone seems to use are AlignACE, Meme and Consensus. And we have also been using Weeder, Sombrero and NestedMica.
>> Be aware that some of the programs (e.g. AlignACE) can give quite different answers on different runs even with the same parameters. And the different programs can give very different answers. I am aware that a number of people (including ourselves) use several of the programs and take the motifs that turn up in most of the programs for further study.
>>
>> There are also programs that search for known motifs (e.g. MAST (companion to MEME), MSCAN, SiteSeer). Two well-known databases of Transcription Factor Binding Sites are TRANSFAC and JASPAR.
>>
>> Hope this helps.
>>
>> Krys
>>
>>
>> Dr Krystyna A Kelly
>> University of Cambridge
>> Department of Pathology
>> Molteno Building, Tennis Court Road
>> Cambridge CB2 1QP
>> Tel: 01223 333331
>> Email: kak28 at cam.ac.uk
>>
>>
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Henrik Hornshøj Jensen
>> Sent: 19 September 2006 10:33
>> To: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] retrieving upstream/intronic sequences using biomaRt
>>
>> Any of you guys know a package that will predict regulatory sites in upstream regions?
>>
>> Regards,
>> Henrik
>>
>>
>>
>> -----Oprindelig meddelelse-----
>> Fra: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] På vegne af Steffen Durinck
>> Sendt: Wednesday, September 13, 2006 2:25 PM
>> Til: Shamit Soneji
>> Cc: BioC
>> Emne: Re: [BioC] retrieving upstream/intronic sequences using biomaRt
>>
>> Hi Shamit,
>>
>> Yes, with biomaRt you can get the upstream sequences but currently not the intronic sequences.
>> Try:
>>
>> library(biomaRt)
>> ensmart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>> getSequence( id="ENSG00000139618", type="ensembl",mart = ensmart, seqType =
>> "5utr")
>>
>> Cheers,
>> Steffen
>>
>>
>> Shamit Soneji wrote:
>>
>>> Is it possible using biomaRt (or any other R/BioC means) to download
>>> the upstream and intron sequences for any given ensembl ID?
>>>
>>> I know this can be done just using straight biomart, but a facility
>>> like this from R would be very useful if one wants to search for TF
>>> binding sites.
>>>
>>> Many thanks
>>>
>>> Shamit
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list