[BioC] promoter prediction
Jing Huang
huangji at ohsu.edu
Mon Nov 19 06:02:50 CET 2012
Many thanks Paul.
Let me first following your advices and do some investigations during
holiday. Then, I will be able to define the question more specific.
Jing
On 11/18/12 8:09 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:
>Hi Jing,
>
>I am including the Bioconductor email list so that we will have a record
>of your question, and the answers we arrive at.
>On Nov 18, 2012, at 5:32 PM, Jing Huang wrote:
>
>> Hi Paul,
>>
>> I am wondering if this would be doable. I have a few genes that form a
>> complex. They have been seen over expressed in a variety of tumors
>> simultaneously.
>>
>Do you hypothesize that their joint over-expression suggests that they
>have common regulators?
>
>> The package that you generated seems to fit the scenario to predict the
>> match between known transcription factor and genes. I would like to
>> predict the transcription factors that are unknown.
>
>One good approach here would be to find candidate regulatory regions for
>each of the members of your complex. Bioc now has a getPromoterSeq
>method, demonstrated at
>http://bioconductor.org/help/workflows/gene-regulation-tfbs/. The rGADEM
>package finds motifs de novo when given a number of sequences, but this
>can be an expensive and inconclusive search when your sequences are long,
>and if your genes are few.
>
>The ENCODE project, and John Stam's group at UW in particular, have
>produced a lot of new data, including DNase1 hypersensitivity regions and
>footprints, and H3K4me methylation profiles, and transcription factor
>binding sites. The can narrow your search considerably. In short, we
>now know much more than we used to about what and where the regulatory
>regions proximal to a gene seem to be. We have just begun prototyping a
>means to provide easy access in Bioconductor to these kinds of data.
>
>Once you have some candidate transcription factor binding sequences, the
>MotIV package (and the external program 'tomtom') can match them against
>know motifs in MotifDb, often identifying transcription factor candidates.
>
>If you could clarify your question a bit, provide an example --
>anonymizing the genes in your complex if need be -- we can try and find
>specific techniques for you to use.
>
>Please reply 'on-list' so that our discussion can be archived, and so
>that others with advice can chip in.
>
>
> - Paul
>
>
>>
>> Is there anyway it is doable?
>>
>> Many many thanks
>>
>> Jing
>> On 10/8/12 8:38 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:
>>
>>> Hi Jing,
>>>
>>> This took WAY too long.
>>>
>>> But it is at last ready. Could you take a look? Give me comments?
>>>
>>> http://www.bioconductor.org/help/workflows/gene-regulation-tfbs/
>>>
>>> Thanks!
>>>
>>> - Paul
>>>
>>> On Jul 5, 2012, at 3:58 PM, Jing Huang wrote:
>>>
>>>> No hurry!
>>>>
>>>> Jing
>>>>
>>>> -----Original Message-----
>>>> From: Paul Shannon [mailto:pshannon at fhcrc.org]
>>>> Sent: Thursday, July 05, 2012 3:43 PM
>>>> To: Jing Huang
>>>> Cc: Paul Shannon
>>>> Subject: Re: promoter prediction
>>>>
>>>> Hi Jing,
>>>>
>>>> Should have something ready by the end of next week.
>>>>
>>>> Sorry it's taken so long!
>>>>
>>>> - Paul
>>>>
>>>> On Jul 5, 2012, at 3:41 PM, Jing Huang wrote:
>>>>
>>>>> Hi Paul,
>>>>>
>>>>> Are you still going to write the package for promoter prediction? I
>>>>> have been very busy with bench work and not been able to study this.
>>>>>
>>>>> It will be nice if you could write the package and present at BioC12
>>>>> meeting by the end of this month.
>>>>>
>>>>> Jing
>>>>>
>>>>> -----Original Message-----
>>>>> From: Paul Shannon [mailto:pshannon at fhcrc.org]
>>>>> Sent: Tuesday, June 12, 2012 12:53 PM
>>>>> To: Jing Huang
>>>>> Cc: Paul Shannon
>>>>> Subject: Re: promoter prediction
>>>>>
>>>>> Cool!
>>>>>
>>>>> On Jun 12, 2012, at 12:46 PM, Jing Huang wrote:
>>>>>
>>>>>> Figured it out on this one.
>>>>>>
>>>>>> Jing
>>>>>>
>>>>>> On 6/12/12 11:51 AM, "Paul Shannon" <pshannon at fhcrc.org> wrote:
>>>>>>
>>>>>>> It's an odd error.
>>>>>>>
>>>>>>> Try this:
>>>>>>>
>>>>>>> ?load
>>>>>>> ?save
>>>>>>>
>>>>>>> Once you understand them, ask yourself, hmmm, what could be wrong
>>>>>>> here?
>>>>>>>
>>>>>>> (I am trying to teach you to fish, rather than just GIVE you fish!)
>>>>>>>
>>>>>>> - Paul
>>>>>>>
>>>>>>> On Jun 12, 2012, at 11:48 AM, Jing Huang wrote:
>>>>>>>
>>>>>>>> Hi Paul,
>>>>>>>>
>>>>>>>> What does this mean?
>>>>>>>>
>>>>>>>>> if (!exists ('e2f3'))
>>>>>>>> + load ('symbolsToGeneIDs.RData', envir=.GlobalEnv)
>>>>>>>> Error: segfault from C stack overflow
>>>>>>>>
>>>>>>>> Many Thanks
>>>>>>>>
>>>>>>>> Jing
>>>>>>>>
>>>>>>>> From: Paul Shannon <pshannon at fhcrc.org>
>>>>>>>> To: Jing Huang <huangji at ohsu.edu>
>>>>>>>> Cc: Paul Shannon <pshannon at fhcrc.org>
>>>>>>>> Subject: Re: promoter prediction
>>>>>>>>
>>>>>>>> Hi Jing,
>>>>>>>>
>>>>>>>> Learning to install software will be a good thing to learn. It's
>>>>>>>>a
>>>>>>>> basic part of any bioinformatician's work!
>>>>>>>>
>>>>>>>> If you look at this page:
>>>>>>>>
>>>>>>>> http://meme.sdsc.edu/meme/meme-download.html
>>>>>>>>
>>>>>>>> You will see a link to 'installation instructions'. That would
>>>>>>>>be a
>>>>>>>> good place to begin.
>>>>>>>>
>>>>>>>> I apologize, I forgot to include this file. Put it in your
>>>>>>>>working
>>>>>>>> directory:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Treat each puzzle you encounter as an opportunity to learn!
>>>>>>>>
>>>>>>>> - Paul
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jun 12, 2012, at 9:08 AM, Jing Huang wrote:
>>>>>>>>
>>>>>>>>> HI Paul,
>>>>>>>>>
>>>>>>>>> I am having trouble to down load MEME. I guess I am not sure what
>>>>>>>>> to
>>>>>>>> down load. In order to run MEME, It seems that they require Perl
>>>>>>>>or
>>>>>>>> Python software? I don't have knowledge on those.
>>>>>>>>>
>>>>>>>>> I have tried to run your scripts and run into errors:
>>>>>>>>>
>>>>>>>>>> if (!exists ('e2f3'))
>>>>>>>>> + load ('symbolsToGeneIDs.RData', envir=.GlobalEnv)
>>>>>>>>> Error in readChar(con, 5L, useBytes = TRUE) : cannot open the
>>>>>>>> connection
>>>>>>>>> In addition: Warning message:
>>>>>>>>> In readChar(con, 5L, useBytes = TRUE) :
>>>>>>>>> cannot open compressed file 'symbolsToGeneIDs.RData', probable
>>>>>>>> reason 'No such file or directory'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Not sure what this means. I am wondering what else do my computer
>>>>>>>> need to be installed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Many thanks
>>>>>>>>>
>>>>>>>>> Jing
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: Paul Shannon <pshannon at fhcrc.org>
>>>>>>>>> To: Jing Huang <huangji at ohsu.edu>
>>>>>>>>> Cc: Paul Shannon <pshannon at fhcrc.org>
>>>>>>>>> Subject: Re: promoter prediction
>>>>>>>>>
>>>>>>>>> Hi Jing,
>>>>>>>>>
>>>>>>>>> My boss has some other plans for me this week :} so I am sending
>>>>>>>>> this
>>>>>>>> to you tonight, giving you (I think) plenty to work on, to study,
>>>>>>>> and to
>>>>>>>> comprehend.
>>>>>>>>>
>>>>>>>>> What I include below is all you need for finding enriched motifs
>>>>>>>>>in
>>>>>>>> the promoters of your genes.
>>>>>>>>>
>>>>>>>>> What is NOT included is finding out the transcription factors
>>>>>>>>>which
>>>>>>>> match those motifs. Learn all of what's here, then you will be
>>>>>>>> ready
>>>>>>>> for MotIV and my new MotifDb -- which should be ready to use by
>>>>>>>>the
>>>>>>>> end
>>>>>>>> of the week.
>>>>>>>>>
>>>>>>>>> There is one file attached, a somewhat improvised R script. It
>>>>>>>>> runs,
>>>>>>>> but it is not in a style you should emulate. But there's lots to
>>>>>>>> learn
>>>>>>>> if you study it, line by line, until everything makes complete
>>>>>>>> sense to
>>>>>>>> you. Please do that!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's how to run the script
>>>>>>>>> 1) Install all the libraries mentioned in the file. For
>>>>>>>>>instance,:
>>>>>>>>> biocLite (c ('org.Hs.eg.db', 'BSgenome.Hsapiens.UCSC.hg19',
>>>>>>>> 'GenomicFeatures', 'TxDb.Hsapiens.UCSC.hg19.knownGene'))
>>>>>>>>> 2) install meme; fix the path to meme in the script so that it
>>>>>>>> matches where the meme executable is on your computer
>>>>>>>>> 3) source ('go.R'); run ('redo')
>>>>>>>>>
>>>>>>>>> meme takes maybe 20 minutes to run on my laptop.
>>>>>>>>>
>>>>>>>>> Having found these motifs, the next step is to use tom-tom, or
>>>>>>>> (better yet) Bioconductor package MotIV and my new MotifDb.
>>>>>>>>> Be aware: the pvalues of these enrichments is not very strong.
>>>>>>>>>
>>>>>>>>> Please study the script, run meme, and get really familiar with
>>>>>>>>>it
>>>>>>>> all. Send me questions if you have them. Then run MotIV with
>>>>>>>> built-in
>>>>>>>> jaspar matrices, comparing the enriched motifs meme found, to the
>>>>>>>> jaspar
>>>>>>>> matrices.
>>>>>>>>>
>>>>>>>>> - Paul
>>>>>>>>>
>>>>>>>>> <PastedGraphic-1.png>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jun 8, 2012, at 2:48 PM, Jing Huang wrote:
>>>>>>>>>
>>>>>>>>>> Hi Paul,
>>>>>>>>>>
>>>>>>>>>> Here is the list but only to you. MCM2,MCM3,MCM4,MCM5,MCM6,
>>>>>>>> MCM7,MCM8. The corresponding ENTREZ ID are,
>>>>>>>> 4171,4172,4173,4174,4175,4176,84515.
>>>>>>>>>>
>>>>>>>>>> I will play with the meme as your email suggested.
>>>>>>>>>>
>>>>>>>>>> Have a nice weekend
>>>>>>>>>>
>>>>>>>>>> Jing
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Paul Shannon [mailto:pshannon at fhcrc.org]
>>>>>>>>>> Sent: Friday, June 08, 2012 2:40 PM
>>>>>>>>>> To: Jing Huang
>>>>>>>>>> Cc: Paul Shannon
>>>>>>>>>> Subject: Re: promoter prediction
>>>>>>>>>>
>>>>>>>>>> Well, two promoters are not enough of a sample in which to find
>>>>>>>> motif enrichments. I'll dredge up an example dataset from
>>>>>>>> elsewhere.
>>>>>>>>>>
>>>>>>>>>> In preparation, you could install meme, and seeing if you can
>>>>>>>>>> adapt
>>>>>>>> the 'get.promoter' function I sent you, for arabidopsis, to human.
>>>>>>>>>>
>>>>>>>>>> I will have a human demo ready mid-week next week.
>>>>>>>>>>
>>>>>>>>>> - Paul
>>>>>>>>>>
>>>>>>>>>> On Jun 8, 2012, at 2:36 PM, Jing Huang wrote:
>>>>>>>>>>
>>>>>>>>>>> I don't remember what the inputs were. Somebody posted a
>>>>>>>>>>>question
>>>>>>>> on the package to our mailing group and I saw it and played with a
>>>>>>>> little bit.
>>>>>>>>>>>
>>>>>>>>>>> The list of gene is confidential. How about I only give you two
>>>>>>>>>>> of
>>>>>>>> them MCM2 and MCM3. The correspond ENTREZ ID are 4171 and 4172.
>>>>>>>>>>>
>>>>>>>>>>> I hope this is enough information.
>>>>>>>>>>>
>>>>>>>>>>> Jing
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Paul Shannon [mailto:pshannon at fhcrc.org]
>>>>>>>>>>> Sent: Friday, June 08, 2012 2:19 PM
>>>>>>>>>>> To: Jing Huang
>>>>>>>>>>> Cc: Paul Shannon
>>>>>>>>>>> Subject: Re: promoter prediction
>>>>>>>>>>>
>>>>>>>>>>> Hi Jing,
>>>>>>>>>>>
>>>>>>>>>>> Do you know what inputs are used for the package you are trying
>>>>>>>>>>> to
>>>>>>>> remember? I cannot think what it would be.
>>>>>>>>>>>
>>>>>>>>>>> Also (I asked this before :}) do you have a list of specific
>>>>>>>> co-regulated genes? Are they confidential? If not, please sent
>>>>>>>>me
>>>>>>>> that
>>>>>>>> list.
>>>>>>>>>>>
>>>>>>>>>>> - Paul
>>>>>>>>>>>
>>>>>>>>>>> On Jun 8, 2012, at 2:16 PM, Jing Huang wrote:
>>>>>>>>>>>
>>>>>>>>>>>> HI Paul,
>>>>>>>>>>>>
>>>>>>>>>>>> I am still studying the a few packages related to predict the
>>>>>>>> shared transcription factor and waiting for you for the new
>>>>>>>>advanced
>>>>>>>> package to be released.
>>>>>>>>>>>>
>>>>>>>>>>>> There is a BIoC package that allows me to predict promoters. I
>>>>>>>> have played with it but don't remember the name of the package. Do
>>>>>>>> you
>>>>>>>> know there is such package by any chance.
>>>>>>>>>>>>
>>>>>>>>>>>> Many thanks
>>>>>>>>>>>>
>>>>>>>>>>>> Jing
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <PastedGraphic-1.png>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the Bioconductor
mailing list