[BioC] Motif search -- access to JASPAR, MotIV package, more TF-PWM relationships?

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Apr 25 16:12:57 CEST 2012


Hi,

To carry on the MEME stuff, a biostar post just pointed me to an
updated scoring metric in tomtom which is made available in the latest
MEME software suite:

http://bioinformatics.oxfordjournals.org/content/27/12/1603.full

Perhaps wrapping parts of the MEME suite into an R library would be useful, no?

You might find the FIRE (and FIRE-pro) suite of tools also useful for
motif discovery, as welll:

http://physiology.med.cornell.edu/faculty/elemento/lab/software.shtml

Related to that, S. Tavazoie gave a talk at the recent CSHL/sysbio
meeting and presented TEISER, which seems pretty cool if you're
looking for structural motifs:

https://tavazoielab.c2b2.columbia.edu/TEISER/

-steve

On Wed, Apr 25, 2012 at 9:44 AM, Zhu, Lihua (Julie)
<Julie.Zhu at umassmed.edu> wrote:
> Paul,
>
> Thanks for the positive feedback on FlyFactorSurvey! The motifs in this
> database are generated using the bacterial one-hybrid method (B1H and
> B1H-seq). All the public motifs can be downloaded freely. It would be useful
> to have a Bioc data package, containing curated and current motifs from all
> organisms if available, that interfaces with MotiV.
>
> MEME works very well in finding motifs from B1H-seq data (Christensen et
> al.,Nucleic Acid Research 2011, Vol39, No.12 e83), although only limited
> motif discovery tools were compared in the paper. Currently, we are working
> on whether motif discovery can be improved with B1H-seq data.
>
> As I understand, MEME is for de nova motif discovery, TOMTOM and STAMP are
> for testing whether the motif returned by a motif finder is significantly
> similar to a known motif, clover is for searching known motifs in a given
> set of sequences. We are thinking of adding clover to our website.
>
> I am looking forward to your collated survey results.
>
> Best regards,
>
> Julie
>
>
> On 4/24/12 11:02 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:
>
>> Hi Julie,
>>
>> FlyFactorSurvey looks great.   Would that we had such a resource (curated,
>> current, and growing) for all organisms!
>>
>> A few questions, if I may:
>>
>>   1) What role with respect to FlyFactorSurvey do you picture us taking here
>> at BioC?  How can we help?
>>
>>   2) Your website (http://pgfe.umassmed.edu/TFDBS) recommends meme and TOMTOM
>> for motif comparison.  Do you use them yourself?  If so, can you tell us about
>> their strengths and weaknesses?  How do they compare to clover?
>> (http://zlab.bu.edu/clover/)
>>
>> In that same spirit -- trying to find out more about this topic -- here are
>> some more questions:
>>
>>   3) The JASPAR database seems to be mostly unchanged since 2009.
>>      (http://jaspar.genereg.net/html/DOWNLOAD). Does anyone know their update
>> policy?
>>
>>   4) Is TRANSFAC only for license holders?
>>
>>   5) Are there any other organism-specific gems like FlyFactorSurvey to be
>> discovered out on the web?
>>
>> Thanks!
>>
>>  - Paul
>>
>> On Apr 24, 2012, at 3:16 PM, Zhu, Lihua (Julie) wrote:
>>
>>> Paul,
>>>
>>> Thanks so much for the comprehensive summary of existing capability of Bioc
>>> and other resources for motif discovery and matching!
>>>
>>> Here is my response to your great initiative to collect use cases and open
>>> data resources.
>>>
>>> Here is an open data source for Drosophila which we developed:
>>> http://pgfe.umassmed.edu/TFDBS/
>>> http://nar.oxfordjournals.org/content/early/2010/11/19/nar.gkq858.full
>>>
>>> As you pointed out, there are several excellent Bioconductor packages
>>> available for the two common cases of motif problems, i.e., de nova motif
>>> discovery and motif matching to known motifs. It would be useful to have
>>> more motif databases available for motif comparison program such as MotIV.
>>> In addition, we use clover to search for known motifs in a given set of
>>> sequences.
>>>
>>> Many thanks for sharing your insights!
>>>
>>> Best regards,
>>>
>>> Julie
>>>
>>>
>>> On 4/24/12 3:02 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:
>>>
>>>> The recent flurry of interest in sequence motifs here on the bioc list
>>>> suggests to us that maybe we at Bioconductor could strengthen our
>>>> infrastructure for this kind of work.  If this work interests you -- either
>>>> as
>>>> a package creator, or as a package user -- please suggest ideas or use
>>>> cases.
>>>> What do you need?  I will collect and collate the responses.   We hope to
>>>> identify places where Bioc can help out.
>>>>
>>>> For background:  we already have a number of packages (rGADEM, MotIV, cosmo,
>>>> BCRANK, motifRG) which address, with different strengths, what I believe to
>>>> be
>>>> the two aspects of the motif problem:
>>>>
>>>>  1) Detecting enriched motifs in DNA sequence, or in ChIP-seq data  (rGADEM,
>>>> cosmo, motifRG, BCRANK)
>>>>  2) Predicting the sequence motifs which bind to these enriched motifs, and
>>>> what binding molecules they belong to (MotIV)
>>>>
>>>> In the past, a lot of sequence motif/binding work has addressed the search
>>>> for
>>>> transcription factor binding sites and their cognate transcription factors.
>>>> miRNAs, phorphorylation and methylation all pose related problems.  Is there
>>>> support which we can practically offer here as well?
>>>>
>>>> In addition to Bioc packages, there are of course many worthwhile websites
>>>> and
>>>> external tools:  JASPAR, meme, STAMP (and TRANSFAC, for those with a
>>>> license).
>>>> Nooshin mentioned the arabidopsis-specific 'AthaMap'
>>>> (http://www.athamap.de).
>>>> Are there other open-source data repositories like this for other organisms?
>>>> c.elegans, as Julie requested?
>>>>
>>>> Questions, suggestions, use cases and data sources are all welcome.
>>>>
>>>> Thanks!
>>>>
>>>> - Paul
>>>>
>>>>
>>>>
>>>>
>>>> On Apr 24, 2012, at 10:47 AM, Zhu, Lihua (Julie) wrote:
>>>>
>>>>> Eloi,
>>>>>
>>>>> I would like to use MotIV for a c.elegans dataset. What data source would
>>>>> you recommend for matchMotif? Many thanks for your help!
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Julie
>>>>>
>>>>>
>>>>> On 4/24/12 1:28 PM, "Mercier Eloi" <emercier at chibi.ubc.ca> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am one of the developer of MotIV. I will be happy to help you if you
>>>>>> have any question regarding the package.
>>>>>>
>>>>>> First, I want to mention that in the Plos One paper, we used PICS,
>>>>>> rGADEM and MotIV as a pipeline but MotIV can be use as a stand alone.
>>>>>> Some of the advanced functions won't be available though.
>>>>>>
>>>>>> Since the PWMs in MotIV correspond to human TF, you may have to use your
>>>>>> own list of PWMs. What MotIV needs is a simple list of matrices
>>>>>> (head(jaspar) to view the format).
>>>>>> Jaspar's PWMs can be easily downloaded but it seems it only contains ~20
>>>>>> motifs. On the other hand, AthaMap has more motifs but I did not manage
>>>>>> to find an easy way to get them. Another place to look at is the AGRIS
>>>>>> website (http://arabidopsis.med.ohio-state.edu/downloads.html).
>>>>>>
>>>>>> If you're only interested by the identification of the motifs and do not
>>>>>> want to do further analysis with R, I recommend you to look at
>>>>>> http://www.benoslab.pitt.edu/stamp for the identification of your motifs.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Eloi Mercier
>>>>>>
>>>>>>
>>>>>> On 12-04-24 07:36 AM, nooshin wrote:
>>>>>>> Thanks a lot for your suggestion. I will for sure have a look and inform
>>>>>>> you.
>>>>>>> Bests,
>>>>>>> Nooshin
>>>>>>>
>>>>>>>
>>>>>>> On 04/24/2012 04:15 PM, Tim Triche, Jr. wrote:
>>>>>>>> Ah, I see.  GSL is a useful library to have installed regardless.
>>>>>>>> Hope things work out.  I found your exchanges with Paul to be useful
>>>>>>>> reading, but obviously I was not reading closely enough, since Paul
>>>>>>>> started off his code sample with biocLite('MotIV').  Oops :-o
>>>>>>>>
>>>>>>>> Here is a paper that I found interesting, which does go into some
>>>>>>>> detail towards a "bulk" approach, from Gottardo's group:
>>>>>>>>
>>>>>>>> http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.00164
>>>>>>>> 32
>>>>>>
>>>>>>>> Perhaps it will be useful to you as well, would be curious to hear if
>>>>>>>> so.
>>>>>>>>
>>>>>>>> --t
>>>>>>>>
>>>>>>>> On Tue, Apr 24, 2012 at 7:00 AM, nooshin<n_omranian at yahoo.com
>>>>>>>> <mailto:n_omranian at yahoo.com>>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>    Thanks, it's been already solved, it needs GSL package, which is a
>>>>>>>>    bit problematic, but I solved it already.
>>>>>>>>
>>>>>>>>    But it does include only 5 matrices (in the webpage) for
>>>>>>>>    arabidopsis and in the package also!
>>>>>>>>    I'm downloading manually from AthaMap!
>>>>>>>>
>>>>>>>>    Thanks again and keep waiting for 'bulk' approach.
>>>>>>>>
>>>>>>>>    Bests,
>>>>>>>>    Nooshin
>>>>>>>>
>>>>>>>>
>>>>>>>>    On 04/24/2012 03:16 PM, Tim Triche, Jr. wrote:
>>>>>>>>>    source("http://bioconductor.org/biocLite.R")
>>>>>>>>>    biocLite("MotIV")
>>>>>>>>>
>>>>>>>>>    ought to do the trick for you
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    On Tue, Apr 24, 2012 at 1:01 AM, nooshin<n_omranian at yahoo.com
>>>>>>>>>    <mailto:n_omranian at yahoo.com>>  wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>        Hi Paul,
>>>>>>>>>
>>>>>>>>>        Thanks a lot.
>>>>>>>>>        I forgot to include bioc, since I only replied to you (no to
>>>>>>>>>        all).
>>>>>>>>>
>>>>>>>>>        I can"t install MotIV package to check. I checked in google but
>>>>>>>>> I
>>>>>>>>>        couldn't find any solution! Do you have any suggestion for
>>>>>>>>>        installing
>>>>>>>>>        this package?
>>>>>>>>>
>>>>>>>>>        Bests,
>>>>>>>>>        Nooshin
>>>>>>>>>
>>>>>>>>>        On 04/23/2012 06:35 PM, Paul Shannon wrote:
>>>>>>>>>> (redirecting this back to the Bioc list...)
>>>>>>>>>>
>>>>>>>>>> Hi Nooshin,
>>>>>>>>>>
>>>>>>>>>> The 'bulk' approach is not quite so ready as I predicted.
>>>>>>>>>         I might have something by the end of the week.
>>>>>>>>>>
>>>>>>>>>> As for mapping between PWMs and TFs, I have most often done
>>>>>>>>>        this with 'tom-tom' from the meme website.
>>>>>>>>>>
>>>>>>>>>> But I just discovered what looks like a good -- maybe
>>>>>>>>>        better -- approach:  the Bioconductor MotIV package, which
>>>>>>>>>        includes a 2010 version of jasper.
>>>>>>>>>> Try this:
>>>>>>>>>>
>>>>>>>>>>    source("http://bioconductor.org/biocLite.R")
>>>>>>>>>>
>>>>>>>>>> biocLite ('MotIV')
>>>>>>>>>> library (MotIV);
>>>>>>>>>> browseVignettes ('MotIV')
>>>>>>>>>>
>>>>>>>>>> The jaspar data in this package has 130 TF-PWM mappings,
>>>>>>>>>        which appear to be human.  More must be known, and publicly
>>>>>>>>>        available.  The JASPAR website has a 'JASPAR CORE Plantae'
>>>>>>>>>         data set that
>>>>>>>>>>    - is probably what you are interested in
>>>>>>>>>>    - might be downloadable, and convertible to the form
>>>>>>>>>        MotIV wants.
>>>>>>>>>>
>>>>>>>>>> Perhaps other readers of the list have other suggestions.
>>>>>>>>>>
>>>>>>>>>> If you have any questions on this, please include 'BioC' in
>>>>>>>>>        your reply, so that we can all get better at this!
>>>>>>>>>>
>>>>>>>>>>  - Paul
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Apr 23, 2012, at 6:53 AM, nooshin wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Paul,
>>>>>>>>>>>
>>>>>>>>>>> Many thanks for your comprehensive information and code!
>>>>>>>>>>> I have a question regarding to extract of PWMs. How and
>>>>>>>>>        where I can download these matrices for all TFs that PWM is
>>>>>>>>>        available for them? I need it only for Arabidopsis thaliana.
>>>>>>>>>>> Is there any package in R which I can give the TF and
>>>>>>>>>        receive the PWM for it? Or any online database which I can
>>>>>>>>>        download from it? I have a big problem since Friday to find
>>>>>>>>>        out these matrices for different TFs of A.th. That would be
>>>>>>>>>        so great if you can help me to get these matrices.
>>>>>>>>>>>
>>>>>>>>>>>> If you want to do this in bulk, Herve' has some lovely
>>>>>>>>>        code to make that efficient.
>>>>>>>>>>> Also can I have this? :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot in advance.
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Nooshin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>        *TODAY*/(Beta) /*.*Powered by Yahoo!
>>>>>>>>>
>>>>>>>>>        Armored catfish wreak havoc in U.S. South
>>>>>>>>>
>>>>>>>>> <http://news.yahoo.com/blogs/sideshow/armored-catfish-wreaking-havoc-so
>>>>>>>>> ut
>>>>>>>>> h-
>>>>>>>>> florida-lakes-182812663.html;_ylc=X3oDMTFia2oyNjZoBF9TAzk1NDAxMDAyNwRwa
>>>>>>>>> 2c
>>>>>>>>> Da
>>>>>>>>> WQtMjIzODM5NARzeWlkA2RfZWNoMGQ4MGQ-#more-4190>
>>>>>>>>>
>>>>>>>>>        Privacy Policy
>>>>>>>>>        <http://info.yahoo.com/privacy/us/yahoo/webbeacons/details.html>
>>>>>>>>>
>>>>>>>>>               [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>        _______________________________________________
>>>>>>>>>        Bioconductor mailing list
>>>>>>>>>        Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>>>>>>>>>        https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>>        Search the archives:
>>>>>>>>>
>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    --
>>>>>>>>>    /A model is a lie that helps you see the truth./
>>>>>>>>>    /
>>>>>>>>>    /
>>>>>>>>>    Howard Skipper
>>>>>>>>>    <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> /A model is a lie that helps you see the truth./
>>>>>>>> /
>>>>>>>> /
>>>>>>>> Howard Skipper
>>>>>>>> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>>>>>
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list