[R] dbi, rodbc, rmysql, charset problem

janek0 janek0 at poczta.onet.pl
Sun Feb 18 14:34:18 CET 2007


Dear List

In my short life as a beginning R-user i've encountered a following
problem that i'm unable to solve myself:

I have a database in MySQL containing table and field names as well as
some data containing Polish accentuated characters (like ąśęć),
utf8-encoded. It works just fine with just any external query browser i
can find, jdbc, odbc, native, whatever. Also mysql is happy about my
charset. It also seems to be configured correctly: 

mysql> status
--------------
mysql  Ver 14.12 Distrib 5.0.24a, for pc-linux-gnu (i486) using readline
5.1

Connection id:          16
Current database:
Current user:           root w localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.0.24a-Debian_9-log
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /var/run/mysqld/mysqld.sock
Uptime:                 1 hour 37 min 17 sec

Threads: 2  Questions: 240  Slow queries: 0  Opens: 175  Flush tables: 1
Open tables: 64  Queries per second avg: 0.041

Yet if i use R's RODBC or RMySQL to connect to my database i can't see
these accentuated characters:

library(RODBC)
con <-odbcConnect("trybunal", uid="root", pwd="mypassword")
sqlTable(con)

the output (abridged) is like that:

TABLE_CAT TABLE_SCHEM                    TABLE_NAME TABLE_TYPE
REMARKS
14  trybunal             Wyk?adnia innych przepis<f3>w      TABLE MySQL 

instead of <F3> i should see "ó" and instead of ? a "ł".

It is just the same if i use RMySQL instead of RODBC:

library(RMySQL)
con <-dbConnect(dbDriver("MySQL"), dbname="trybunal", username="root",
password="mypassword")
dbListTables(con)

the output (abridged) is like that:
[13] "Ustawa"                        "Wyk?adnia innych przepis<f3>w"

and if i use
dbReadTable(con, "Metryczka")

("Metryczka" being one table in the database) i get:

Error in make.names(as.character(names), allow_) : 
        invalid multibyte string 11

It works without error if i set LC_ALL to "C", but obviously without
Polish charset.

Strange thing is that Sys.getlocale() gives me

[1]"LC_CTYPE=pl_PL.UTF-8;LC_NUMERIC=C;LC_TIME=pl_PL.UTF-8;LC_COLLATE=pl_PL.UTF-8;
LC_MONETARY=pl_PL.UTF-8;LC_MESSAGES=pl_PL.UTF-8;
LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"

So it is utf8 all over the place. Also R works just fine with
read.table() if the table contains utf8-encoded chars. Thus the problem
is just with R-mysql connection. It seems therefore that dbi package
does not support non-ascii charsets.

Questions:
1. Is above conclusion correct or am i doing something wrong ?
2. If it is correct, is there any way to use table and field names as
they are now (with non-ascii chars) in my SQL queries (e.g. SELECT
`Wykładnia przedmiotu kontroli`.*) ? 

I can live with Polish characters missing in the output if i have to but
i must address the database fields/tables. I can't change their names of
as this would mean rebuilding database frontend. I can't import data to
R by exporting the database and then importing it via read.table because
i want a "live" application and must keep things simple.

Any help will be greatly appreciated.
-- 
janek0 <janek0 w poczta.onet.pl>



More information about the R-help mailing list