[Bioc-devel] Invalid multibyte string
Sean Davis
sdavis2 at mail.nih.gov
Mon Mar 6 14:04:14 CET 2006
On 3/4/06 5:06 AM, "Florian Hahne" <f.hahne at dkfz-heidelberg.de> wrote:
> Hi Sean,
> I had a similar problem with invalid multibyte strings in the UTF-8
> locale. This error occurs when you apply any of the string processing
> functions to a string that contains non-UTF-8 characters. In my code I
> use the function iconv to convert the string to latin encoding before
> applying strsplit (take a look at the code below). This substitutes the
> illegal characters with the hex code of the respective byte. Not sure if
> this is helpful in your situation, but at least it doesn't force the
> user into using a specific locale.
>
> readFCStext <- function(con, offsets) {
> seek(con, offsets["textstart"])
> txt <- readChar(con, offsets["textend"]-offsets["textstart"]+1)
> txt <- iconv(txt, "", "latin1", sub="byte")
> delimiter <- substr(txt, 1, 1)
> sp <- strsplit(substr(txt, 2, nchar(txt)), split=delimiter,
> fixed=TRUE)[[1]]
> rv <- sp[seq(2, length(sp), by=2)]
> names(rv) <- sp[seq(1, length(sp)-1, by=2)]
> return(rv)
> }
Thanks, Florian. I'll give this a try.
Sean
More information about the Bioc-devel
mailing list