[BioC] matching transcription factor binding sites
Hans-Ulrich Klein
h.klein at uni-muenster.de
Sat Apr 12 20:30:12 CEST 2008
Hi Herve,
Herve Pages wrote:
> Hans-Ulrich Klein wrote:
>> I want to locate transcription factor binding sites (tfbs) within a
>> given sequence. The tfbs are derived from databases like transfac or
>> jaspar and are described by matrices. Are there algorithms for
>> locating tfbs matches (e.g. "matinspector") implemented in
>> bioconductor? I could not find one.
>
> I assume that your matrices are Position Weight Matrices?
yes. I meant position weight matrices.
> There is no facility in the Biostrings package for matching PWM to
> a DNA sequence but that would be easy to add. In fact, I've
> already fully described how to implement such facility
> in a separate package and on top of Biostrings basic containers (i.e.
> DNAString objects) during the lab I gave for the "Advanced R
> for Bioinformatics" course back in February this year:
>
> http://bioconductor.org/workshops/2008
>
> Follow "Advanced R for Bioinformatics" -> "Interfaces to C (Lab)"
>
> The simpleMatchPWM_0.99.0.tar.gz package contains the matchPWM()
> function for finding all matches of a PWM in a given sequence.
> Unfortunately, the package was depending on a devel version
> of Biostrings that has changed since then, and
> those changes broke simpleMatchPWM 0.99.0. Let me know if this is what
> you are looking for and I'll fix the package (this should
> be straightforward).
It is quite close to what I am looking for. I have access to the
transfac database including a web based tool for finding PWM matches. I
am looking for an alternative to the web tool in R for two reasons:
1. I have done preceding analysis in R and will do follow-up analysis in
R. It would be nice to avoid the effort for data export and import.
2. I have not found a detailed description of the algorithm used by the
web tool.
So simpleMatchPWM is at least a good starting point, as it does all the
basic score computations. Why not integrate the matchPWM function in the
Biostring package? I would appreciate it.
However, most algorithms (like MatInspector or the transfac-tool)
implement some heuristics to improve results. E.g., they suggest
individual cut-off values depending on the length of the pwms. I am not
sure whether I have enough time and knowledge to add such functionalities.
Best wishes,
Hans-Ulrich
PS: Has someone experiences with the bioperl package "TFBS"?
More information about the Bioconductor
mailing list