[R] Problem with number characters
Martin Maechler
maechler at stat.math.ethz.ch
Fri Oct 15 09:11:12 CEST 2004
>>>>> "Spencer" == Spencer Graves <spencer.graves at pdf.com>
>>>>> on Thu, 14 Oct 2004 13:41:24 -0700 writes:
Spencer> It looks like you have several non-printing
Spencer> characters. "nchar" will give you the total number
Spencer> of characters in each character string.
Spencer> "strsplit" can break character strings into single
Spencer> characters, and "%in%" can be used to classify
Spencer> them.
and you give nice coding examples:
Spencer> Consider the following:
>> x <- "Draszt 0%/1ÃÃ?iso8859-15ó"
>> nx <- nchar(x)
>> x. <- strsplit(x, "")
>> length(x.[[1]])
Spencer> [1] 29
>>
>> namechars <- c(letters, LETTERS, as.character(0:9), ".")
just to be precise: If 'namechars' is supposed to mean
``characters valid in R object names'', then you should have
added "_" as well:
namechars <- c(letters, LETTERS, as.character(0:9), ".", "_")
>> punctuation <- c(",", "!", "+", "*", "&", "|")
>> legalchars <- c(namechars, punctuation)
and 'legalchars' would have to contain quite a bit more I
presume, e.g. "$", "@", ....
(but that wouldn't have been a reason to write this e-mail..)
>> legalx <- lapply(x., function(y)(y %in% legalchars))
>> x.[[1]][!legalx[[1]]]
Spencer> [1] " " "" "%" "/" "Ã" "" "Ã" "?" "-" "" "Ã" "³"
>>
>> sapply(legalx, sum)
Spencer> [1] 17
Spencer> Will this give you ideas about what to do what you want?
Spencer> hope this helps. spencer graves
(and this too)
Martin Maechler, ETH Zurich
Spencer> Gabor Grothendieck wrote:
>> Assuming that the problem is that your input file has
>> additional embedded characters added by the data base
>> program you could try extracting just the text using
>> the UNIX strings program:
>>
>> strings myfile.csv > myfile.txt
>>
>> and see if myfile.txt works with R and if not check out
>> what the differences are between it and the .csv file.
>>
>> Date: Thu, 14 Oct 2004 11:31:33 -0700
>> From: Scott Waichler <scott.waichler at pnl.gov>
>> To: <r-help at stat.math.ethz.ch>
>> Subject: [R] Problem with number characters
>>
>>
>> I am trying to process text fields scanned in from a csv file that is
>> output from the Windows database program FileMakerPro. The characters
>> onscreen look like regular text, but R does not like their underlying binary form.
>> For example, one of text fields contains a name and a number, but
>> R recognizes the number as something other than what it appears
>> to be in plain text. The character string "Draszt 03" after being
>> read into R using scan and ="" becomes "Draszt 03" where the 3 is
>> displayed in my R session as a superscript. Here is the result pasted
>> into this email I'm composing in emacs: "Draszt 0%/1ÃÃ?iso8859-15ó"
>> Another clue for the knowledgable: when I try to display the vector element
>> causing trouble, I get
>> <CHARSXP: "Draszt 0%/1ÃÃ?iso8859-15ó">
>> where again the superscipt part is just "3" in my R session. I'm working in
>> Linux, R version 1.9.1, 2004-06-21. Your help will be much appreciated.
>>
>> Scott Waichler
>> Pacific Northwest National Laboratory
>> scott.waichler at pnl.gov
More information about the R-help
mailing list