[BioC] RMAPPER and whole genome TFBS information
Vincent Carey
stvjc at channing.harvard.edu
Sun Apr 17 05:29:18 CEST 2011
I am listed as the author of this package, and indeed some years ago I
wrote the R code that interfaces to the XML-RPC of MAPPER database. I
don't know exactly why you are seeing the error that you are seeing,
and as far as I can tell your inputs meet the requirement of the
rmapperHelp() server-generated documentation.
I registered to use the database manually and created a query that it
processed as
Gene: Trp53rk (transformation related protein 53 regulating kinase)
Gene ID: 76367 mRNA accession: NM_023815
Organism: Mus musculus
Scanned region: chr2:166617267-166626993 (click to download)
Models: JASPAR matrices, TRANSFAC matrices, M00789
This yielded over 2400 hits, for example:
Gene GeneID Transcript Factor Name(s) Strand Chrom Start End
Region Score E-value
Trp53rk 76367 NM_023815 M00791 HNF3 + chr2 166,617,268
166,617,279 Promoter 4.6 14
Trp53rk 76367 NM_023815 MA0041 Foxd3 - chr2 166,617,269
166,617,279 Promoter 2.9 11
Trp53rk 76367 NM_023815 MA0047 Foxa2 - chr2 166,617,269
166,617,280 Promoter 3.9 4.3
with further details on first hit
Trp53rk 76367 NM_023815 M00791 HNF3 + chr2 166,617,268
166,617,279 Promoter 4.6 14
Gene: Trp53rk Factor: HNF3 Position (abs): chr2:166,617,268-166,617,279
Gene ID: 76367 Model: M00791 Position (tx): -1999 to -1988
mRNA: NM_023815 Alignment:
*->taaacaaAca.a<-*
t+ acaaA+a +
TGTACAAATAtT
Position (cds): -2045 to -2034
ENSEMBL: ENSMUSG00000042 Score: 4.6 E-value: 14
Gene region: Promoter Strand: + Conserved: -
in principle RMAPPER will return all such information. However when I
try to pass the related query information to readMapper function, I
get a success code but just a header back -- no hit data is returned.
Specifically
> readMAPPER(gene="Trp53rk", models="M00789",org = "Mm", pbases = 2000)
Error in seq.default(1, nh * 4, 4) : wrong sign in 'by' argument
Enter a frame number, or 0 to exit
1: readMAPPER(gene = "Trp53rk", models = "M00789", org = "Mm", pbases = 2000)
2: new("mapperHits", query = sett, hits = reshapeMapper(tmp))
3: initialize(value, ...)
4: initialize(value, ...)
5: reshapeMapper(tmp)
6: df[seq(1, nh * 4, 4), ]
7: `[.data.frame`(df, seq(1, nh * 4, 4), )
8: seq(1, nh * 4, 4)
9: seq.default(1, nh * 4, 4)
So I suggest you contact the maintainers. I will carbon them on this note.
R version 2.13.0 Patched (2011-04-14 r55443)
Platform: x86_64-apple-darwin10.6.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices datasets tools utils methods
[8] base
other attached packages:
[1] org.Mm.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5
[4] AnnotationDbi_1.13.21 Biobase_2.11.10 biomaRt_2.8.0
[7] RMAPPER_1.3.0 weaver_1.17.0 codetools_0.2-8
[10] digest_0.4.2
loaded via a namespace (and not attached):
[1] RCurl_1.5-0 XML_3.2-0
On Sat, Apr 16, 2011 at 10:41 PM, Ravi Karra <ravi.karra at gmail.com> wrote:
> Hello,
>
> I am trying to identify all putative GATA binding sites in the mouse genome. Ideally, I want to get genomic coordinates for each "binding site" to enter into a GenomicRanges object (I know there will be a lot of hits) and to overlay this information with the results of a ChIP-Seq experiment. Seems that there are multiple packages to try and do this with, but only RMAPPER allows an interface with the TRANSFAC and Jaspar TF binding site models.
> I have been getting multiple errors that I am not sure how to resolve. Is this package the best way to get the information I want? Is there a better alternative? Is there an upper limit to the MAPPER query?
>
> Thanks for your help,
> Ravi
>
> #load the necessary libraries
> library (RMAPPER)
> library (biomaRt)
>
> #Compute the mouse genome
> #get identifiers to be input into MAPPER
> mm = useMart (biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
> mmGenes = getBM (attributes = c ("ensembl_gene_id", "external_gene_id", "entrezgene", "external_transcript_id"), mart = mm)
> #get list of all entrez gene id's
> egids = unique (mmGenes$entrezgene); egids = egids [2:length (egids)] #first id is NA
>
> #make a list of all geneids
> eglist = paste (egids [500:550], collapse = ",")
>
> #get the factor models
> gata = "M00789, T02689, T00311, T00306, T00305, T00267, T00305, T00267, T00306, T00311, M00632, M00462, MA0037"
>
> #Run MAPPER with 50 genes
> gatah = readMAPPER (gene = eglist, models = gata, org = "Mm", pbases = 5000)
>
>>Error in file(con, "r") : cannot open the connection
> In addition: Warning message:
> In file(con, "r") : cannot open: HTTP status was '0 (null)'
>
> #Run MAPPER with 10 genes
> eglist = paste (egids [500:510], collapse = ",")
> gatah = readMAPPER (gene = eglist, models = gata, org = "Mm", pbases = 5000)
>
>> Error in seq.default(1, nh * 4, 4) : wrong sign in 'by' argument
>
>
>> traceback ()
> 10: stop("wrong sign in 'by' argument")
> 9: seq.default(1, nh * 4, 4)
> 8: seq(1, nh * 4, 4)
> 7: `[.data.frame`(df, seq(1, nh * 4, 4), )
> 6: df[seq(1, nh * 4, 4), ]
> 5: reshapeMapper(tmp)
> 4: initialize(value, ...)
> 3: initialize(value, ...)
> 2: new("mapperHits", query = sett, hits = reshapeMapper(tmp))
> 1: readMAPPER(gene = eglist, models = gata, org = "Mm", pbases = 5000)
>
>> sessionInfo ()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_2.8.0 RMAPPER_1.2.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.5-0 tools_2.13.0 XML_3.2-0
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list