[BioC] PWMmatch: position weight matrix or position frequency matrix?
Hervé Pagès
hpages at fhcrc.org
Thu Feb 24 21:42:46 CET 2011
3 little additions... (see below)
On 02/24/2011 12:14 PM, Hervé Pagès wrote:
> Hi Zuzanna,
>
> On 02/22/2011 10:59 AM, Hervé Pagès wrote:
> [...]
>> Finally note that the Biostrings package doesn't provide a tool
>> to convert a position frequency matrix (that can be obtained with
>> consensusMatrix) into a position weight matrix.
>
> More on this and to clarify the role of the PWM() function mentioned
> by Val.
>
> PWM() can be used on a set of short sequences to compute the associated
> Position Weight Matrix using the Wasserman & Sandelin's approach.
> As its name suggests, PWM() will always return a PWM, not a PFM.
> The 'type' argument controls the type of Position Weight Matrix that
> is returned.
> The 'prior.params' argument controls the Dirichlet conjugate prior.
> By this argument is set to c(A=0.25, C=0.25, G=0.25, T=0.25).
^^^
by default
>
> In the example given by Val, PWM(sset, type="prob") returns a PWM
> that is just the PFM divided by a constant (this constant being
> the number of short sequences in the input).
Not true that this constant is the number of short sequences in
the input. However, it doesn't matter what this constant is...
> So, in that particular
> case, matchPWM() will give the same result whether you pass it the
> PFM or the PWM obtained with PWM(sset , type="prob"). (Multiplying
> the PWM by a constant doesn't affect the output of matchPWM).
>
> But this is only a particular situation. It's not true in general
> that PWM( , type="prob") will return a PWM that is just the
> PFM divided by a constant. For example it would not be the case
> anymore if you were using a 'prior.params' vector that contains
> values that are not all the same.
Just to be more concrete about this. By just adding 1 sequence to
'sset', things look very different:
> sset <- DNAStringSet(c("AGTT", "ATGC", "AACG", "AATG", "CCAA"))
> consensusMatrix(sset)[DNA_BASES, ]
[,1] [,2] [,3] [,4]
A 4 2 1 1
C 1 1 1 1
G 0 1 1 2
T 0 1 2 1
> PWM(sset, type="prob")
[,1] [,2] [,3] [,4]
A 0.46428571 0.17857143 0.03571429 0.03571429
C 0.03571429 0.03571429 0.03571429 0.03571429
G -0.10714286 0.03571429 0.03571429 0.17857143
T -0.10714286 0.03571429 0.17857143 0.03571429
Note the negative weights!
Cheers,
H.
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list