[BioC] errors usage of local biomart database with biomaRt package
Faheem Mitha
faheem at email.unc.edu
Tue Aug 19 18:59:16 CEST 2008
Hi,
A few days ago I posted a message to the maintainers of the biomaRt
package regarding some problems I was having using the biomaRt R
package with a local installation of biomart. One of them replied
earlier today, but unfortunately I managed to delete his message
somehow. If whoever it was could bounce a copy of his reply to me for
my records, I would be grateful.
I read the message before it disappeared, and the rough sense of it
was as follows.
1) The message I sent was useless. (This was fair, since I forgot to
include a reproducible error.)
2) I should write to the bioconductor mailing list with more useful
information, which I'm doing now.
Ok, so I'm trying again with a hopefully more informative message,
including a code traceback. See below. If this attempt still sucks
from a reproducibility standpoint, I welcome constructive criticism.
I can read archives via gmane, but please cc me anyway.
Thanks, Faheem.
*************************************************************************
I'm having some difficulty with the usage of a local database
installation with biomaRt.
The biomaRt code I'm using gives an error message with my (partial)
installation of the local database.
I
a) Attach a transcript of the local database installation.
b) Include the biomaRt code and the error message it gives below.
The local database is mysql, using data from ensembl.org, specifically
ftp://ftp.ensembl.org/pub/release-50/mysql/ensembl_mart_50/ and
ftp://ftp.ensembl.org/pub/release-50/mysql/snp_mart_50/.
If you can tell me what I'm doing wrong, I would appreciate it. I hope
it is just that I don't have all the necessary tables
installed. Obviously it is not practical to install the entire
database, and I was unable to find documentation about what each of
the tables are or the dependencies among them.
Also, I got a hang in the mysql interpreter when I tried to import
#mysql> LOAD DATA INFILE
'/home/faheem/ensembl/hsapiens_snp__variation__main.txt'
INTO TABLE hsapiens_snp__variation__main;
so I'm commenting that one out for now in the transcript. Not much
point trying again, since at the moment I don't even know if it is
relevant.
Thanks, Faheem.
****************************************************************************
biomart.R
****************************************************************************
library("biomaRt")
SNPanno=function(rsid,dbmart,enmart, verbose=FALSE)
{
attrib=c("refsnp_id","chr_name","snp_chrom_start","snp_chrom_strand","ensembl_gene_stable_id","snp_allele","ensembl_type")
info=getBM(attributes=attrib,filters="refsnp",values=rsid, mart=dbmart,
verbose=verbose)
if(!is.null(info))
{
genename=getBM(attributes="external_gene_id",filters="ensembl_gene_id",values=as.character(info[5]),mart=enmart,
verbose=verbose)
if(!is.null(genename))
{
info=data.frame(info,genename)
}
else
{
info=data.frame(info,genename=NA)
}
}
else
{
info=paste("no annotation info found for", rsid)
}
return(info)
}
rsid = "rs12726453"
#mart=useMart("snp",dataset="hsapiens_snp")
#en=useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
mart=useMart(biomart="snp_mart_50", mysql=TRUE, host="localhost",
user="faheem", password="e=mc^2", local=TRUE, dataset="hsapiens_snp")
en=useMart(biomart="ensembl_mart_50", mysql=TRUE, host="localhost",
user="faheem", password="e=mc^2", local=TRUE, dataset="hsapiens_gene_ensembl")
snp = SNPanno(rsid,mart,en, verbose=TRUE)
****************************************************************************
> source("biomart.R")
Loading required package: RMySQL
Loading required package: DBI
Reading database configuration of: hsapiens_snp
Checking attributes and filters ... ok
Checking main tables ... ok
Reading database configuration of: hsapiens_gene_ensembl
Checking attributes and filters ... ok
Checking main tables ... ok
[1] "SELECT DISTINCT hsapiens_snp__variation__main.name_2025,
hsapiens_snp__variation_feature__main.name_1059,
hsapiens_snp__variation_feature__main.seq_region_start_2026,
hsapiens_snp__variation_feature__main.seq_region_strand_2026,
hsapiens_snp__transcript_variation__dm.stable_id_1023,
hsapiens_snp__variation_feature__main.allele_string_2026,
hsapiens_snp__transcript_variation__dm.biotype_1064 FROM
hsapiens_snp__variation__main INNER JOIN (
hsapiens_snp__variation_feature__main,hsapiens_snp__transcript_variation__dm )
ON ( hsapiens_snp__variation_feature__main.variation_id_2025_key =
hsapiens_snp__variation__main.variation_id_2025_key AND
hsapiens_snp__transcript_variation__dm.variation_id_2025_key =
hsapiens_snp__variation__main.variation_id_2025_key) WHERE
hsapiens_snp__variation__main.name_2025 IN ('rs12726453')"
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column
'hsapiens_snp__transcript_variation__dm.variation_id_2025_key' in 'on clause')
*****************************************************************************
-------------- next part --------------
# Instructions for installing ensembl database on Debian. Comments start with #. Current version of * is 50.
STEP 0:
$ sudo apt-get install mysql-server
# Add faheem as user.
$ mysql --user=root mysql
mysql> GRANT ALL PRIVILEGES ON *.* TO 'faheem'@'localhost' IDENTIFIED BY 'passwd' WITH GRANT OPTION;
mysql> GRANT ALL PRIVILEGES ON *.* TO 'faheem'@'%' IDENTIFIED BY 'passwd' WITH GRANT OPTION;
# Log in as faheem.
STEP 1: Create ensembl_mart and snp_mart databases.
$ mysql -A --user=faheem --password='passwd'
mysql> CREATE DATABASE ensembl_mart_*;
mysql> CREATE DATABASE snp_mart_*;
mysql> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| ensembl_mart_50 |
| mysql |
| snp_mart_50 |
+--------------------+
STEP 2: Download and unzip sql files for creating and populating database tables/
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/ensembl_mart_*/ensembl_mart_*.sql.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/ensembl_mart_*/meta_conf__dataset__main.txt.gz --output-document=ensembl_meta_conf__dataset__main.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/ensembl_mart_*/meta_conf__xml__dm.txt.gz --output-document=ensembl_meta_conf__xml__dm.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/ensembl_mart_*/hsapiens_gene_ensembl__gene__main.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/snp_mart_*.sql.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/meta_conf__dataset__main.txt.gz --output-document=snp_meta_conf__dataset__main.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/meta_conf__xml__dm.txt.gz --output-document=snp_meta_conf__xml__dm.txt.gz
#$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/hsapiens_snp__variation__main.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/hsapiens_snp__variation_synonym_Affy6__dm.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/hsapiens_snp__variation_synonym_dbSNP__dm.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/hsapiens_snp__transcript_variation__dm.001.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/hsapiens_snp__transcript_variation__dm.002.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/hsapiens_snp__transcript_variation__dm.003.txt.gz
$ wget -c ftp://ftp.ensembl.org/pub/release-*/mysql/snp_mart_*/hsapiens_snp__transcript_variation__dm.004.txt.gz
$ gunzip *.gz
STEP 3: Create and populate ensembl_mart tables.
# Connect to ensembl_mart_*.
mysql> \r ensembl_mart_*
# SQL script creates tables in ensembl_mart_*.
$ mysql -D ensembl_mart_* -u faheem -p < ensembl_mart_*.sql
mysql> SHOW TABLES;
[list of created tables...]
mysql> LOAD DATA INFILE '/home/faheem/ensembl/ensembl_meta_conf__dataset__main.txt' INTO TABLE meta_conf__dataset__main;
mysql> LOAD DATA INFILE '/home/faheem/ensembl/ensembl_meta_conf__xml__dm.txt' INTO TABLE meta_conf__xml__dm;
mysql> LOAD DATA INFILE '/home/faheem/ensembl/hsapiens_gene_ensembl__gene__main.txt' INTO TABLE hsapiens_gene_ensembl__gene__main;
STEP 4: Create and populate snp_mart tables.
# Connect to snp_mart_*.
mysql> \r snp_mart_*
# SQL script creates tables in snp_mart_*.
$ mysql -D snp_mart_* -u faheem -p < snp_mart_*.sql
mysql> SHOW TABLES;
[list of created tables...]
mysql> LOAD DATA INFILE '/home/faheem/ensembl/snp_meta_conf__dataset__main.txt' INTO TABLE meta_conf__dataset__main;
mysql> LOAD DATA INFILE '/home/faheem/ensembl/snp_meta_conf__xml__dm.txt' INTO TABLE meta_conf__xml__dm;
#mysql> LOAD DATA INFILE '/home/faheem/ensembl/hsapiens_snp__variation__main.txt' INTO TABLE hsapiens_snp__variation__main;
mysql> LOAD DATA INFILE '/home/faheem/ensembl/hsapiens_snp__variation_synonym_Affy6__dm.txt' INTO TABLE hsapiens_snp__variation_synonym_Affy6__dm;
mysql> LOAD DATA INFILE '/home/faheem/ensembl/hsapiens_snp__variation_synonym_dbSNP__dm.txt' INTO TABLE hsapiens_snp__variation_synonym_dbSNP__dm;
mysql> LOAD DATA INFILE '/home/faheem/ensembl/hsapiens_snp__transcript_variation__dm.001.txt' INTO TABLE hsapiens_snp__transcript_variation__dm;
mysql> LOAD DATA INFILE '/home/faheem/ensembl/hsapiens_snp__transcript_variation__dm.002.txt' INTO TABLE hsapiens_snp__transcript_variation__dm;
mysql> LOAD DATA INFILE '/home/faheem/ensembl/hsapiens_snp__transcript_variation__dm.003.txt' INTO TABLE hsapiens_snp__transcript_variation__dm;
mysql> LOAD DATA INFILE '/home/faheem/ensembl/hsapiens_snp__transcript_variation__dm.004.txt' INTO TABLE hsapiens_snp__transcript_variation__dm;
More information about the Bioconductor
mailing list