[BioC] PWMscoreStartingAt and ambiguous subject seqs

Janet Young jayoung at fhcrc.org
Tue May 22 22:34:50 CEST 2012


Hi again,

Yes, Zizhen agrees that averaging is a good choice.  You're right - might be good to know what other tools do (I wish I had a little more time to look into that, but don't). Maybe others on the list have some experience?

Janet


On May 22, 2012, at 11:33 AM, Janet Young wrote:

> Hi Herve,
> 
> Thanks for checking that out - interesting.   I'm a little naive about these things, but your idea of averaging sounds reasonable to me.   I'm copying Zizhen here - she's worked with PWMs much more than I have, and might have some helpful thoughts.
> 
> Janet
> 
> 
> On May 21, 2012, at 10:33 PM, Hervé Pagès wrote:
> 
>> Hi Janet,
>> 
>> On 05/21/2012 06:34 PM, Janet Young wrote:
>>> Hi there,
>>> 
>>> I'm using PWMscoreStartingAt from Biostrings - it's VERY useful for me - thanks!
>>> 
>>> Some of the sequences I'm scanning include ambiguities (some N, some Y, etc - uses IUPAC codes).  I'm really glad that PWMscoreStartingAt works on these sequences, but I'd like to understand how scores are calculated when an N (or whatever) is present - would it be easy for you to add that to the documentation?  (or just an email response would be fine too, but seems useful to add it to the docs)
>> 
>> It seems that IUPAC ambiguity codes are simply ignored in the
>> calculation of the score. For example with the 'pwm' used in the
>> man page for PWMscoreStartingAt():
>> 
>>> dim(pwm)
>> [1]  4 13
>>> PWMscoreStartingAt(pwm, DNAString("AAAAAAAAAAAAA"))
>> [1] 0.4960267
>>> sum(pwm["A", ])
>> [1] 0.4960267
>>> PWMscoreStartingAt(pwm, DNAString("NNNNNNNNNNNNN"))
>> [1] 0
>> 
>> This is probably not very satisfying. Maybe the contribution of an
>> ambiguity to the score should be the average of the contributions
>> of the individual bases represented by the ambiguity? I could implement
>> this if that sounds reasonable. Feedback on this is welcome, and, in
>> particular, it would be good to know how other tools handle this.
>> 
>> Thanks!
>> H.
>> 
>>> 
>>> thanks very much,
>>> 
>>> Janet
>>> 
>>> 
>>> -------------------------------------------------------------------
>>> 
>>> Dr. Janet Young
>>> 
>>> Tapscott and Malik labs
>>> 
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Avenue N., C3-168,
>>> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>>> 
>>> tel: (206) 667 1471 fax: (206) 667 6524
>>> email: jayoung  ...at...  fhcrc.org
>>> 
>>> 
>>> -------------------------------------------------------------------
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> 
>> -- 
>> Hervé Pagès
>> 
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>> 
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
> 



More information about the Bioconductor mailing list