[BioC] Querying/manipulating JASPAR data

Raphael Gottardo bioclist at me.com
Fri Jun 25 21:35:24 CEST 2010


You should check out MotIV. It contains JASPAR 2010 and you can use it to mach your list of PWMs to the list of motifs in there. 
We are also open to suggestions that might improve the package.

Raphael


On 2010-06-25, at 3:20 PM, Thomas Girke wrote:

> I am not sure if this helps:
> 
> Below is a parser function that I used in the past to import their PWMs from: 
> http://jaspar.genereg.net/html/DOWNLOAD/all_data/matrix_only/matrix_only.txt
> It stores the PWMs in a list from where they can be passed on to the Biostrings'
> matchPWM function...
> 
> ## Import function
> importJaspar <- function(file=myloc) {
>        vec <- readLines(file)
>        vec <- gsub("\\[|\\]", "", vec)
>        start <- grep(">", vec); end <- grep(">", vec) - 1
>        pos <- data.frame(start=start, end=c(end[-1], length(vec)))
>        pwm <- sapply(seq(along=pos[,1]), function(x) vec[pos[x,1]:pos[x,2]])
>        pwm <- sapply(seq(along=pwm), function(x) strsplit(pwm[[x]], " {1,}"))
>        pwm <- sapply(seq(along=start), function(x) matrix(as.numeric(t(as.data.frame(pwm[(pos[x,1]+1):pos[x,2]]))[,-1]), nrow=4, dimnames=list(c("A", "C", "G", "T"), NULL)))
>        names(pwm) <- gsub(">", "", vec[start])
>        return(pwm)
> }
> pwm <- importJaspar(file="http://jaspar.genereg.net/html/DOWNLOAD/all_data/matrix_only/matrix_only.txt")
> pwmnorm <- sapply(names(pwm), function(x) apply(pwm[[x]], 2, function(y) y/sum(y))) 
> 
> 
> Best,
> 
> Thomas
> 
> On Fri, Jun 25, 2010 at 01:47:30PM -0400, Steve Lianoglou wrote:
>> Howdy,
>> 
>> I was curious if there are any packages or other means (some web
>> api(?)) to retrieve and parse JASPAR PWM's.
>> 
>> I have a need to get some PWMs for transcription factors and am
>> slicing/dicing the files I've downloaded from JASPAR.
>> 
>> Since I'm in the middle of dealing with that, I was wondering if it
>> was worth being a bit more careful with my code and perhaps whipping
>> up a jaspaR package of sorts that makes this data available via some
>> bioc-friendly code.
>> 
>> Cheers,
>> -steve
>> 
>> -- 
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list