[R] dbi, rodbc, rmysql, charset problem
janek0
janek0 at poczta.onet.pl
Sun Feb 18 14:34:18 CET 2007
Dear List
In my short life as a beginning R-user i've encountered a following
problem that i'm unable to solve myself:
I have a database in MySQL containing table and field names as well as
some data containing Polish accentuated characters (like ąśęć),
utf8-encoded. It works just fine with just any external query browser i
can find, jdbc, odbc, native, whatever. Also mysql is happy about my
charset. It also seems to be configured correctly:
mysql> status
--------------
mysql Ver 14.12 Distrib 5.0.24a, for pc-linux-gnu (i486) using readline
5.1
Connection id: 16
Current database:
Current user: root w localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.0.24a-Debian_9-log
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 1 hour 37 min 17 sec
Threads: 2 Questions: 240 Slow queries: 0 Opens: 175 Flush tables: 1
Open tables: 64 Queries per second avg: 0.041
Yet if i use R's RODBC or RMySQL to connect to my database i can't see
these accentuated characters:
library(RODBC)
con <-odbcConnect("trybunal", uid="root", pwd="mypassword")
sqlTable(con)
the output (abridged) is like that:
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE
REMARKS
14 trybunal Wyk?adnia innych przepis<f3>w TABLE MySQL
instead of <F3> i should see "ó" and instead of ? a "ł".
It is just the same if i use RMySQL instead of RODBC:
library(RMySQL)
con <-dbConnect(dbDriver("MySQL"), dbname="trybunal", username="root",
password="mypassword")
dbListTables(con)
the output (abridged) is like that:
[13] "Ustawa" "Wyk?adnia innych przepis<f3>w"
and if i use
dbReadTable(con, "Metryczka")
("Metryczka" being one table in the database) i get:
Error in make.names(as.character(names), allow_) :
invalid multibyte string 11
It works without error if i set LC_ALL to "C", but obviously without
Polish charset.
Strange thing is that Sys.getlocale() gives me
[1]"LC_CTYPE=pl_PL.UTF-8;LC_NUMERIC=C;LC_TIME=pl_PL.UTF-8;LC_COLLATE=pl_PL.UTF-8;
LC_MONETARY=pl_PL.UTF-8;LC_MESSAGES=pl_PL.UTF-8;
LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"
So it is utf8 all over the place. Also R works just fine with
read.table() if the table contains utf8-encoded chars. Thus the problem
is just with R-mysql connection. It seems therefore that dbi package
does not support non-ascii charsets.
Questions:
1. Is above conclusion correct or am i doing something wrong ?
2. If it is correct, is there any way to use table and field names as
they are now (with non-ascii chars) in my SQL queries (e.g. SELECT
`Wykładnia przedmiotu kontroli`.*) ?
I can live with Polish characters missing in the output if i have to but
i must address the database fields/tables. I can't change their names of
as this would mean rebuilding database frontend. I can't import data to
R by exporting the database and then importing it via read.table because
i want a "live" application and must keep things simple.
Any help will be greatly appreciated.
--
janek0 <janek0 w poczta.onet.pl>
More information about the R-help
mailing list