[BioC] Querying/manipulating JASPAR data
Thomas Girke
thomas.girke at ucr.edu
Fri Jun 25 21:20:17 CEST 2010
I am not sure if this helps:
Below is a parser function that I used in the past to import their PWMs from:
http://jaspar.genereg.net/html/DOWNLOAD/all_data/matrix_only/matrix_only.txt
It stores the PWMs in a list from where they can be passed on to the Biostrings'
matchPWM function...
## Import function
importJaspar <- function(file=myloc) {
vec <- readLines(file)
vec <- gsub("\\[|\\]", "", vec)
start <- grep(">", vec); end <- grep(">", vec) - 1
pos <- data.frame(start=start, end=c(end[-1], length(vec)))
pwm <- sapply(seq(along=pos[,1]), function(x) vec[pos[x,1]:pos[x,2]])
pwm <- sapply(seq(along=pwm), function(x) strsplit(pwm[[x]], " {1,}"))
pwm <- sapply(seq(along=start), function(x) matrix(as.numeric(t(as.data.frame(pwm[(pos[x,1]+1):pos[x,2]]))[,-1]), nrow=4, dimnames=list(c("A", "C", "G", "T"), NULL)))
names(pwm) <- gsub(">", "", vec[start])
return(pwm)
}
pwm <- importJaspar(file="http://jaspar.genereg.net/html/DOWNLOAD/all_data/matrix_only/matrix_only.txt")
pwmnorm <- sapply(names(pwm), function(x) apply(pwm[[x]], 2, function(y) y/sum(y)))
Best,
Thomas
On Fri, Jun 25, 2010 at 01:47:30PM -0400, Steve Lianoglou wrote:
> Howdy,
>
> I was curious if there are any packages or other means (some web
> api(?)) to retrieve and parse JASPAR PWM's.
>
> I have a need to get some PWMs for transcription factors and am
> slicing/dicing the files I've downloaded from JASPAR.
>
> Since I'm in the middle of dealing with that, I was wondering if it
> was worth being a bit more careful with my code and perhaps whipping
> up a jaspaR package of sorts that makes this data available via some
> bioc-friendly code.
>
> Cheers,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list