[Bioc-devel] Invalid multibyte string

Sean Davis sdavis2 at mail.nih.gov
Mon Mar 6 14:04:14 CET 2006




On 3/4/06 5:06 AM, "Florian Hahne" <f.hahne at dkfz-heidelberg.de> wrote:

> Hi Sean,
> I had a similar problem with invalid multibyte strings in the UTF-8
> locale. This error occurs when you apply any of the string processing
> functions to a string that contains  non-UTF-8 characters. In my code I
> use the function iconv to convert the string to latin encoding before
> applying strsplit (take a look at the code below). This substitutes the
> illegal characters with the hex code of the respective byte. Not sure if
> this is helpful in your situation, but at least it doesn't force the
> user into using a specific locale.
> 
> readFCStext <- function(con, offsets) {
>   seek(con, offsets["textstart"])
>   txt <- readChar(con, offsets["textend"]-offsets["textstart"]+1)
>   txt <- iconv(txt, "", "latin1", sub="byte")
>   delimiter <- substr(txt, 1, 1)
>   sp  <- strsplit(substr(txt, 2, nchar(txt)), split=delimiter,
> fixed=TRUE)[[1]]
>   rv <- sp[seq(2, length(sp), by=2)]
>   names(rv) <- sp[seq(1, length(sp)-1, by=2)]
>   return(rv)
> }

Thanks, Florian.  I'll give this a try.

Sean



More information about the Bioc-devel mailing list