[R] Problem with number characters

Fri Oct 15 09:11:12 CEST 2004

>>>>> "Spencer" == Spencer Graves <spencer.graves at pdf.com>
>>>>>     on Thu, 14 Oct 2004 13:41:24 -0700 writes:

    Spencer>   It looks like you have several non-printing
    Spencer> characters.  "nchar" will give you the total number
    Spencer> of characters in each character string.

    Spencer> "strsplit" can break character strings into single
    Spencer> characters, and "%in%" can be used to classify
    Spencer> them.

and you give nice coding examples:

    Spencer> Consider the following:
    >> x <- "Draszt 0%/1Ã‚Ã‚?iso8859-15Ã‚Â³"
    >> nx <- nchar(x)
    >> x. <- strsplit(x, "")
    >> length(x.[[1]])
    Spencer> [1] 29
    >> 
    >> namechars <- c(letters, LETTERS, as.character(0:9), ".")

just to be precise:  If 'namechars' is supposed to mean
``characters valid in R object names'', then you should have
added "_" as well:

namechars <- c(letters, LETTERS, as.character(0:9), ".", "_")

    >> punctuation <- c(",", "!", "+", "*", "&", "|")
    >> legalchars <- c(namechars, punctuation)

and 'legalchars' would have to contain quite a bit more I
presume, e.g. "$", "@", ....
(but that wouldn't have been a reason to write this e-mail..)

    >> legalx <- lapply(x., function(y)(y %in% legalchars))
    >> x.[[1]][!legalx[[1]]]
    Spencer> [1] " " "" "%" "/" "Ã‚" "" "Ã‚" "?" "-" "" "Ã‚" "Â³"
    >> 
    >> sapply(legalx, sum)
    Spencer> [1] 17

    Spencer> Will this give you ideas about what to do what you want?
    Spencer> hope this helps. spencer graves

(and this too)

Martin Maechler, ETH Zurich

    Spencer> Gabor Grothendieck wrote:

    >> Assuming that the problem is that your input file has 
    >> additional embedded characters added by the data base
    >> program you could try extracting just the text using
    >> the UNIX strings program:
    >> 
    >> strings myfile.csv > myfile.txt
    >> 
    >> and see if myfile.txt works with R and if not check out
    >> what the differences are between it and the .csv file.
    >> 
    >> Date:   Thu, 14 Oct 2004 11:31:33 -0700 
    >> From:   Scott Waichler <scott.waichler at pnl.gov>
    >> To:   <r-help at stat.math.ethz.ch> 
    >> Subject:   [R] Problem with number characters 
    >> 
    >> 
    >> I am trying to process text fields scanned in from a csv file that is
    >> output from the Windows database program FileMakerPro. The characters
    >> onscreen look like regular text, but R does not like their underlying binary form.
    >> For example, one of text fields contains a name and a number, but
    >> R recognizes the number as something other than what it appears
    >> to be in plain text. The character string "Draszt 03" after being
    >> read into R using scan and ="" becomes "Draszt 03" where the 3 is 
    >> displayed in my R session as a superscript. Here is the result pasted
    >> into this email I'm composing in emacs: "Draszt 0%/1Ã‚Ã‚?iso8859-15Ã‚Â³"
    >> Another clue for the knowledgable: when I try to display the vector element
    >> causing trouble, I get
    >> <CHARSXP: "Draszt 0%/1Ã‚Ã‚?iso8859-15Ã‚Â³">
    >> where again the superscipt part is just "3" in my R session. I'm working in
    >> Linux, R version 1.9.1, 2004-06-21. Your help will be much appreciated.
    >> 
    >> Scott Waichler
    >> Pacific Northwest National Laboratory
    >> scott.waichler at pnl.gov