[Bioc-devel] Invalid multibyte string
Seth Falcon
sfalcon at fhcrc.org
Sat Mar 4 17:54:44 CET 2006
Hi, I'm forwarding a msg from Florian.
Florian: I've added this address also to the list so that you should
be able to post with it as well.
+ seth
From: Florian Hahne <f.hahne at dkfz-heidelberg.de>
Subject: Re: [Bioc-devel] Invalid multibyte string
To: Sean Davis <sdavis2 at mail.nih.gov>
CC: bioc-devel at stat.math.ethz.ch
Date: Sat Mar 4 02:06:06 2006 -0800
Hi Sean,
I had a similar problem with invalid multibyte strings in the UTF-8
locale. This error occurs when you apply any of the string processing
functions to a string that contains non-UTF-8 characters. In my code
I use the function iconv to convert the string to latin encoding
before applying strsplit (take a look at the code below). This
substitutes the illegal characters with the hex code of the respective
byte. Not sure if this is helpful in your situation, but at least it
doesn't force the user into using a specific locale.
readFCStext <- function(con, offsets) {
seek(con, offsets["textstart"])
txt <- readChar(con, offsets["textend"]-offsets["textstart"]+1)
txt <- iconv(txt, "", "latin1", sub="byte")
delimiter <- substr(txt, 1, 1)
sp <- strsplit(substr(txt, 2, nchar(txt)), split=delimiter,
fixed=TRUE)[[1]]
rv <- sp[seq(2, length(sp), by=2)]
names(rv) <- sp[seq(1, length(sp)-1, by=2)]
return(rv)
}
Cheers,
Florian
More information about the Bioc-devel
mailing list