[BioC] Querying/manipulating JASPAR data

Thomas Girke thomas.girke at ucr.edu
Fri Jun 25 21:20:17 CEST 2010


I am not sure if this helps:

Below is a parser function that I used in the past to import their PWMs from: 
http://jaspar.genereg.net/html/DOWNLOAD/all_data/matrix_only/matrix_only.txt
It stores the PWMs in a list from where they can be passed on to the Biostrings'
matchPWM function...

## Import function
importJaspar <- function(file=myloc) {
        vec <- readLines(file)
        vec <- gsub("\\[|\\]", "", vec)
        start <- grep(">", vec); end <- grep(">", vec) - 1
        pos <- data.frame(start=start, end=c(end[-1], length(vec)))
        pwm <- sapply(seq(along=pos[,1]), function(x) vec[pos[x,1]:pos[x,2]])
        pwm <- sapply(seq(along=pwm), function(x) strsplit(pwm[[x]], " {1,}"))
        pwm <- sapply(seq(along=start), function(x) matrix(as.numeric(t(as.data.frame(pwm[(pos[x,1]+1):pos[x,2]]))[,-1]), nrow=4, dimnames=list(c("A", "C", "G", "T"), NULL)))
        names(pwm) <- gsub(">", "", vec[start])
        return(pwm)
}
pwm <- importJaspar(file="http://jaspar.genereg.net/html/DOWNLOAD/all_data/matrix_only/matrix_only.txt")
pwmnorm <- sapply(names(pwm), function(x) apply(pwm[[x]], 2, function(y) y/sum(y))) 


Best,

Thomas

On Fri, Jun 25, 2010 at 01:47:30PM -0400, Steve Lianoglou wrote:
> Howdy,
> 
> I was curious if there are any packages or other means (some web
> api(?)) to retrieve and parse JASPAR PWM's.
> 
> I have a need to get some PWMs for transcription factors and am
> slicing/dicing the files I've downloaded from JASPAR.
> 
> Since I'm in the middle of dealing with that, I was wondering if it
> was worth being a bit more careful with my code and perhaps whipping
> up a jaspaR package of sorts that makes this data available via some
> bioc-friendly code.
> 
> Cheers,
> -steve
> 
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list