[R] Identifying special characters in a text file
jholtman at gmail.com
Fri Feb 12 03:31:06 CET 2010
Setup a regular expression to only keep what you want. This example
keep alpha, nums, spaces , commas and periods:
> x <- readLines(textConnection('I discovered that the following works:
+ any(is.na(strsplit(readLines(FILE), "")))
+ I am wondering whether anyone has a better approach to this problem.
+ Dennis bullet ©©©ƒƒƒƒƒƒŽŽŽŽŽŽŸŸŸ
+ Dennis Fisher MD
+ P < (The "P Less Than" Company)
+ Phone: 1-866-PLessThan (1-866-753-7784)
+ Fax: 1-866-PLessThan (1-866-753-7784)
> # replace characters not matching alphanum, space, period, comma
> gsub("[^[:alnum:][:space:][,.]", "", x) # regular expression to change
 "I discovered that the following works"
 " anyis.nastrsplitreadLinesFILE, "
 "I am wondering whether anyone has a better approach to this problem."
 "Dennis bullet "
 "Dennis Fisher MD"
 "P The P Less Than Company"
 "Phone 1866PLessThan 18667537784"
 "Fax 1866PLessThan 18667537784"
On Thu, Feb 11, 2010 at 8:46 PM, Dennis Fisher <fisher at plessthan.com> wrote:
> R 2.10.1 on a Mac
> I read in textfiles using readLines, then I process those files, then I use R to execute another program. Occasionally those files contain characters other than letter / numbers / routine punctuation marks. For example, a bullet (option-8 on a Mac) triggers the problem.
> Although R can read and process those characters, the other program cannot so I would like to identify these characters and exit gracefully with a warning.
> I discovered that the following works:
> any(is.na(strsplit(readLines(FILE), "")))
> I am wondering whether anyone has a better approach to this problem.
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help