[Bioc-devel] ¿A useful addition to MotifDb package?

Diego Diez diego10ruiz at gmail.com
Thu Oct 11 05:13:32 CEST 2012

Hi all,

I am very interested on this indeed! I have a package (with plans to
submit it for the next Bioc release) that uses a mysql database of PWM
(Jaspar, Uniprobe) and so the MotifDb came as a surprise, although in
the good sense (more bioconductor integration). I was looking also for
ways to integrate the goals of the package with an R-only workflow
(currently using MEME for motif matching), so the new bioc workflow
really helps me to accomplish this.

Regarding the limits for promoters, I found very varied options in
different papers. For example, in the same embryonic stem cells some
authors used +-5kb TSS (as pointed out before by Steve) and others
used -8+2kb. So there is definitely no consensus on that. Providing
some examples for boundaries to use can be useful for the novel users,
but always stating clearly that these are not fixed. And for the
functions probably not having a default is the best option.


On Thu, Oct 11, 2012 at 3:56 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
> Hi Steve, Paul,
> On 10/09/2012 09:03 AM, Steve Lianoglou wrote:
>> Hi Paul,
>> On Tue, Oct 9, 2012 at 11:29 AM, Paul Shannon <pshannon at fhcrc.org> wrote:
>>> Hi Steve,
>>> Very timely, very helpful!   Just yesterday I proposed to Martin, as a
>>> taks for the coming sprint:
>>>   4) Add the new TF PWMs from ENCODE into MotifDb
>>> I had not yet gotten as far as locating the data at ebi.  Thanks!
>>> If you care to take a look, perhaps comment, this Bioc workflow became
>>> visible yesterday, but has not yet been generally announced:
>>>    http://www.bioconductor.org/help/workflows/gene-regulation-tfbs/
>> Interesting.
>> I'll have to take a closer look at it later. I (really) quickly
>> skimmed the first 1/4th of it -- here is a rather minor comment:
>> Under the "Sequence Search" section, the numbers for "loosely"
>> defining the promoter bounds is 1k-3k up and 100-300 downstream from
>> the TSS. I think these numbers aren't too controversial if you're
>> talking about yeast (which the workflow seems to be about), but it
>> might not hurt to specify that these numbers may not be appropriate in
>> all contexts -- as another point of ref, the paper I linked to uses 5k
>> up/down stream from the TSS for "proximal regulatory regions" of genes
> So I wonder if it would not be better to not provide default values
> for the 'upstream' and 'downstream' arguments of the promoters()
> extractor. Whatever we do, getPromoterSeq() and promoters() should
> probably do the same (default values of 2000 and 200 for promoters(),
> no default values for getPromoterSeq()).
> Thanks,
> H.
>> ...
>> I will look at this more closely later, though -- it looks very helpful.
>> Nice work!
>> -steve
> --
> Hervé Pagès
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

More information about the Bioc-devel mailing list