[BioC] Trans sites associated with a gene (Transfac database?)

Lapointe, David David.Lapointe at umassmed.edu
Thu Oct 8 15:59:34 CEST 2009

The free Transfac is very old. BioBase distributes current versions via
subscription. Have you looked at Jaspar?
http://jaspar.cgb.ki.se/. I use that with Emboss. 


-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Tim Smith
Sent: Thursday, October 08, 2009 9:00 AM
To: Michael Lawrence
Cc: bioc
Subject: Re: [BioC] Trans sites associated with a gene (Transfac

Thanks Michael. I hadn't realized that Transfac is not free. Do you know
of  any free databases that might work?   -- thanks!

From: Michael Lawrence <mflawren at fhcrc.org>

Cc: bioc <bioconductor at stat.math.ethz.ch>
Sent: Wed, October 7, 2009 6:33:57 PM
Subject: Re: [BioC] Trans sites associated with a gene (Transfac

>Hi all,
>>I wanted to get the trans factor sites that affect a set of genes. Is
there any package in bioconductor that will enable me to do this?

I don't know of any package that does this directly, but here are some

If you have access to the (not free) transfac database, these functions
will read in the database and profile (PRF) files:

readTransFac <- function(con) {
  getField <- function(name) {
    name <- paste("^", name, sep = "")
    sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)])
  lines <- readLines(con)
  nms <- getField("ID")
  npos <- getField("MATR_LENGTH")
  mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]",
  f <- file()
  writeLines(mats, f)
  mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T")))
  matlist <- split.data.frame(mattab, rep(seq_along(npos),
  matlist <- lapply(matlist, t) ## OUCH -- slow step
  names(matlist) <- nms
  attr(matlist, "labels") <- getField("NA")
  attr(matlist, "threshold") <- getField("THRESHOLD")

readPRF <- function(con) {
  read.table(con, skip = 4, comment.char = "/",
             col.names = c("A", "B", "cutoff", "AC", "ID"))

You can use these like this:

> transfac <- readTransFac("transfac/matrixTFP92.lib")
> muscle <- readPRF("transfac/muscle_specific.prf")
> pwm <- transfac[as.character(muscle$ID)]

Then 'pwm' is a list of matrices. You can then find the hits to a genome
using Biostrings:

> hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%")

Now 'hits' represents the hits of the first PWM against Human chromosome
1, at 90% cutoff.

You can convert that to an IRanges object:

> ir <- as(hits, "IRanges")

And then use that with the overlap() function in IRanges, along with
some gene annotations, like those from the GenomicFeatures package (an
experimental data package) to find associations with genes. 

> library(GenomicFeatures)
> data(geneHuman)
> trans <- transcripts(geneHuman)
> hitsInPromoters <- ir[trans[1]$promoter]

To find the promoter (+/- 500bp from TSS) hits on chr1.

Most of this code is not tested, but it should serve as a nice outline.


>>        [[alternative HTML version deleted]]
>>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>>Search the archives:

	[[alternative HTML version deleted]]

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
Search the archives:

More information about the Bioconductor mailing list