[BioC] PWMscoreStartingAt and ambiguous subject seqs

Janet Young jayoung at fhcrc.org
Tue May 22 20:33:39 CEST 2012


Hi Herve,

Thanks for checking that out - interesting.   I'm a little naive about these things, but your idea of averaging sounds reasonable to me.   I'm copying Zizhen here - she's worked with PWMs much more than I have, and might have some helpful thoughts.

Janet


On May 21, 2012, at 10:33 PM, Hervé Pagès wrote:

> Hi Janet,
> 
> On 05/21/2012 06:34 PM, Janet Young wrote:
>> Hi there,
>> 
>> I'm using PWMscoreStartingAt from Biostrings - it's VERY useful for me - thanks!
>> 
>> Some of the sequences I'm scanning include ambiguities (some N, some Y, etc - uses IUPAC codes).  I'm really glad that PWMscoreStartingAt works on these sequences, but I'd like to understand how scores are calculated when an N (or whatever) is present - would it be easy for you to add that to the documentation?  (or just an email response would be fine too, but seems useful to add it to the docs)
> 
> It seems that IUPAC ambiguity codes are simply ignored in the
> calculation of the score. For example with the 'pwm' used in the
> man page for PWMscoreStartingAt():
> 
> > dim(pwm)
> [1]  4 13
> > PWMscoreStartingAt(pwm, DNAString("AAAAAAAAAAAAA"))
> [1] 0.4960267
> > sum(pwm["A", ])
> [1] 0.4960267
> > PWMscoreStartingAt(pwm, DNAString("NNNNNNNNNNNNN"))
> [1] 0
> 
> This is probably not very satisfying. Maybe the contribution of an
> ambiguity to the score should be the average of the contributions
> of the individual bases represented by the ambiguity? I could implement
> this if that sounds reasonable. Feedback on this is welcome, and, in
> particular, it would be good to know how other tools handle this.
> 
> Thanks!
> H.
> 
>> 
>> thanks very much,
>> 
>> Janet
>> 
>> 
>> -------------------------------------------------------------------
>> 
>> Dr. Janet Young
>> 
>> Tapscott and Malik labs
>> 
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Avenue N., C3-168,
>> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>> 
>> tel: (206) 667 1471 fax: (206) 667 6524
>> email: jayoung  ...at...  fhcrc.org
>> 
>> 
>> -------------------------------------------------------------------
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319



More information about the Bioconductor mailing list