[R] Problem with number characters
Waichler, Scott R
Scott.Waichler at pnl.gov
Fri Oct 15 01:09:25 CEST 2004
Gabor wrote:
>Assuming that the problem is that your input file has
>additional embedded characters added by the data base
>program you could try extracting just the text using
>the UNIX strings program:
>
> strings myfile.csv > myfile.txt
Spencer wrote:
>"strsplit" can break character strings into single
>characters, and "%in%" can be used to classify them.
The first suggestion helped me identify and remove
some of the embedded characters, namely "^K". Many more remained
hidden.
The second suggestion gave me the idea of
splitting the string on whitespace first, and seeing if the
embedded character problem would go way along with the "blank"
spaces. It did. In the snippet below, x is the character variable
I am trying to process:
str.vec <- strsplit(x, "\\s+", perl=T)[[1]]
if(length(str.vec) > 0) {
x <- paste(str.vec, collapse=" ")
x <- gsub("^\\s+", "", x, perl=T)
x <- gsub("\\s+$", "", x, perl=T)
}
There were no problems in processing x thereafter.
Thank you, gentlemen.
Scott Waichler
More information about the R-help
mailing list