[R] Problem with number characters

Waichler, Scott R Scott.Waichler at pnl.gov
Fri Oct 15 01:09:25 CEST 2004


Gabor wrote:
>Assuming that the problem is that your input file has 
>additional embedded characters added by the data base
>program you could try extracting just the text using
>the UNIX strings program:
>
>   strings myfile.csv > myfile.txt

Spencer wrote:
>"strsplit" can break character strings into single 
>characters, and "%in%" can be used to classify them.

The first suggestion helped me identify and remove
some of the embedded characters, namely "^K".  Many more remained
hidden.

The second suggestion gave me the idea of
splitting the string on whitespace first, and seeing if the
embedded character problem would go way along with the "blank"
spaces.  It did.  In the snippet below, x is the character variable
I am trying to process:

      str.vec <- strsplit(x, "\\s+", perl=T)[[1]]
      if(length(str.vec) > 0) {
        x <- paste(str.vec, collapse=" ")
        x <- gsub("^\\s+", "", x, perl=T)
        x <- gsub("\\s+$", "", x, perl=T)
      }

There were no problems in processing x thereafter.

Thank you, gentlemen.

Scott Waichler




More information about the R-help mailing list