[BioC] How do I use biomaRt to get upstreamFlank Genomic Sequence for many Genomes?
Noah Dowell
noahd at ucla.edu
Mon Dec 20 19:54:57 CET 2010
Hello All,
Problem:
I would like to obtain the genomic sequence that is upstream (~500 bp) of a specific bacterial gene. I want to get this sequence for all bacteria genomes that have the gene. On EcoCyc I see that many (> 100) bacteria have the gene but I do not know how to get all of the sequence in a high-throughput manner so I was going to use biomaRt to get the sequence and send to alignment programs later. I have read through the vignette and tried to get the function to work with a non-ensembl MART to no avail. I also was presented with an error (see below) that suggested I report to the mailing list. It looks like I will also have to query each of the 249 bacterial genomes in the "bacterial_mart_7" Mart individually (with getLDS or getBM) which does not seem high-throughput at all... are there any other suggestions that will allow me to take advantage a the large amount of bacterial genomic data for homology studies?
Thank you for your help.
Noah
Attempted Solution (for a single genome):
> bacGenome = useMart("bacterial_mart_7", dataset = "esc_20_gene")
Checking attributes ... ok
Checking filters ... ok
>
> filters = c("external_gene_id")
>
> attributes = c("external_gene_id","upstream_flank")
>
> values = list(external_gene_id = c("fis"), 500)
> seq = getBM(attributes=attributes, filters = filters, values = values, mart= bacGenome,
+ checkFilters= FALSE)
V1
1 fis
Error in getBM(attributes = attributes, filters = filters, values = values, :
The query to the BioMart webservice returned an invalid result: the number of columns in the result table does not equal the number of attributes in the query. Please report this to the mailing list.
> sessionInfo()
R version 2.11.0 (2010-04-22)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rtracklayer_1.8.1 RCurl_1.3-1 bitops_1.0-4.1 biomaRt_2.4.0
loaded via a namespace (and not attached):
[1] Biobase_2.8.0 Biostrings_2.16.0 BSgenome_1.16.0 GenomicRanges_1.0.1 IRanges_1.6.0
[6] tools_2.11.0 XML_2.8-1
More information about the Bioconductor
mailing list