[Bioc-devel] Invalid multibyte string

Seth Falcon sfalcon at fhcrc.org
Sat Mar 4 17:54:44 CET 2006


Hi, I'm forwarding a msg from Florian.

Florian: I've added this address also to the list so that you should
be able to post with it as well.  

+ seth


From: Florian Hahne <f.hahne at dkfz-heidelberg.de>
Subject: Re: [Bioc-devel] Invalid multibyte string
To: Sean Davis <sdavis2 at mail.nih.gov>
CC: bioc-devel at stat.math.ethz.ch
Date: Sat Mar  4 02:06:06 2006 -0800

Hi Sean,
I had a similar problem with invalid multibyte strings in the UTF-8 
locale. This error occurs when you apply any of the string processing
functions to a string that contains  non-UTF-8 characters. In my code
I use the function iconv to convert the string to latin encoding
before applying strsplit (take a look at the code below). This
substitutes the illegal characters with the hex code of the respective
byte. Not sure if this is helpful in your situation, but at least it
doesn't force the user into using a specific locale.

readFCStext <- function(con, offsets) {
  seek(con, offsets["textstart"])
  txt <- readChar(con, offsets["textend"]-offsets["textstart"]+1)
  txt <- iconv(txt, "", "latin1", sub="byte")
  delimiter <- substr(txt, 1, 1)
  sp  <- strsplit(substr(txt, 2, nchar(txt)), split=delimiter,
  fixed=TRUE)[[1]]
  rv <- sp[seq(2, length(sp), by=2)]
  names(rv) <- sp[seq(1, length(sp)-1, by=2)]
  return(rv)
}

Cheers,
Florian



More information about the Bioc-devel mailing list