[BioC] PWMscoreStartingAt and ambiguous subject seqs
Janet Young
jayoung at fhcrc.org
Tue May 22 20:33:39 CEST 2012
Hi Herve,
Thanks for checking that out - interesting. I'm a little naive about these things, but your idea of averaging sounds reasonable to me. I'm copying Zizhen here - she's worked with PWMs much more than I have, and might have some helpful thoughts.
Janet
On May 21, 2012, at 10:33 PM, Hervé Pagès wrote:
> Hi Janet,
>
> On 05/21/2012 06:34 PM, Janet Young wrote:
>> Hi there,
>>
>> I'm using PWMscoreStartingAt from Biostrings - it's VERY useful for me - thanks!
>>
>> Some of the sequences I'm scanning include ambiguities (some N, some Y, etc - uses IUPAC codes). I'm really glad that PWMscoreStartingAt works on these sequences, but I'd like to understand how scores are calculated when an N (or whatever) is present - would it be easy for you to add that to the documentation? (or just an email response would be fine too, but seems useful to add it to the docs)
>
> It seems that IUPAC ambiguity codes are simply ignored in the
> calculation of the score. For example with the 'pwm' used in the
> man page for PWMscoreStartingAt():
>
> > dim(pwm)
> [1] 4 13
> > PWMscoreStartingAt(pwm, DNAString("AAAAAAAAAAAAA"))
> [1] 0.4960267
> > sum(pwm["A", ])
> [1] 0.4960267
> > PWMscoreStartingAt(pwm, DNAString("NNNNNNNNNNNNN"))
> [1] 0
>
> This is probably not very satisfying. Maybe the contribution of an
> ambiguity to the score should be the average of the contributions
> of the individual bases represented by the ambiguity? I could implement
> this if that sounds reasonable. Feedback on this is welcome, and, in
> particular, it would be good to know how other tools handle this.
>
> Thanks!
> H.
>
>>
>> thanks very much,
>>
>> Janet
>>
>>
>> -------------------------------------------------------------------
>>
>> Dr. Janet Young
>>
>> Tapscott and Malik labs
>>
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Avenue N., C3-168,
>> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>>
>> tel: (206) 667 1471 fax: (206) 667 6524
>> email: jayoung ...at... fhcrc.org
>>
>>
>> -------------------------------------------------------------------
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
More information about the Bioconductor
mailing list