[R] Problem with number characters
Spencer Graves
spencer.graves at pdf.com
Thu Oct 14 22:41:24 CEST 2004
It looks like you have several non-printing characters.
"nchar" will give you the total number of characters in each character
string.
"strsplit" can break character strings into single characters, and
"%in%" can be used to classify them.
Consider the following:
> x <- "Draszt 0%/1ÂÂ?iso8859-15³"
> nx <- nchar(x)
> x. <- strsplit(x, "")
> length(x.[[1]])
[1] 29
>
> namechars <- c(letters, LETTERS,
+ as.character(0:9), ".")
> punctuation <- c(",", "!", "+", "*", "&", "|")
> legalchars <- c(namechars, punctuation)
>
> legalx <- lapply(x., function(y)(y %in% legalchars))
> x.[[1]][!legalx[[1]]]
[1] " " "" "%" "/" "Â" "" "Â" "?" "-" "" "Â" "³"
>
> sapply(legalx, sum)
[1] 17
Will this give you ideas about what to do what you want?
hope this helps. spencer graves
Gabor Grothendieck wrote:
>Assuming that the problem is that your input file has
>additional embedded characters added by the data base
>program you could try extracting just the text using
>the UNIX strings program:
>
> strings myfile.csv > myfile.txt
>
>and see if myfile.txt works with R and if not check out
>what the differences are between it and the .csv file.
>
>Date: Thu, 14 Oct 2004 11:31:33 -0700
>From: Scott Waichler <scott.waichler at pnl.gov>
>To: <r-help at stat.math.ethz.ch>
>Subject: [R] Problem with number characters
>
>
>I am trying to process text fields scanned in from a csv file that is
>output from the Windows database program FileMakerPro. The characters
>onscreen look like regular text, but R does not like their underlying binary form.
>For example, one of text fields contains a name and a number, but
>R recognizes the number as something other than what it appears
>to be in plain text. The character string "Draszt 03" after being
>read into R using scan and ="" becomes "Draszt 03" where the 3 is
>displayed in my R session as a superscript. Here is the result pasted
>into this email I'm composing in emacs: "Draszt 0%/1ÂÂ?iso8859-15³"
>Another clue for the knowledgable: when I try to display the vector element
>causing trouble, I get
><CHARSXP: "Draszt 0%/1ÂÂ?iso8859-15³">
>where again the superscipt part is just "3" in my R session. I'm working in
>Linux, R version 1.9.1, 2004-06-21. Your help will be much appreciated.
>
>Scott Waichler
>Pacific Northwest National Laboratory
>scott.waichler at pnl.gov
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
--
Spencer Graves, PhD, Senior Development Engineer
O: (408)938-4420; mobile: (408)655-4567
More information about the R-help
mailing list