[R] Identifying special characters in a text file

jim holtman jholtman at gmail.com
Fri Feb 12 03:31:06 CET 2010


Setup a regular expression to only keep what you want.  This example
keep alpha, nums, spaces , commas and periods:

> x <- readLines(textConnection('I discovered that the following works:
+        any(is.na(strsplit(readLines(FILE), "")))
+
+ I am wondering whether anyone has a better approach to this problem.
+
+ Dennis  bullet ©©©ƒƒƒƒƒƒŽŽŽŽŽŽŸŸŸ
+
+ Dennis Fisher MD
+ P < (The "P Less Than" Company)
+ Phone: 1-866-PLessThan (1-866-753-7784)
+ Fax: 1-866-PLessThan (1-866-753-7784)
+ www.PLessThan.com'))
> closeAllConnections()
> # replace characters not matching alphanum, space, period, comma
> gsub("[^[:alnum:][:space:][,.]", "", x)  # regular expression to change
 [1] "I discovered that the following works"
 [2] "       anyis.nastrsplitreadLinesFILE, "
 [3] ""
 [4] "I am wondering whether anyone has a better approach to this problem."
 [5] ""
 [6] "Dennis  bullet "
 [7] ""
 [8] "Dennis Fisher MD"
 [9] "P  The P Less Than Company"
[10] "Phone 1866PLessThan 18667537784"
[11] "Fax 1866PLessThan 18667537784"
[12] "www.PLessThan.com"
>
>


On Thu, Feb 11, 2010 at 8:46 PM, Dennis Fisher <fisher at plessthan.com> wrote:
> Colleagues
>
> R 2.10.1 on a Mac
>
> I read in textfiles using readLines, then I process those files, then I use R to execute another program.  Occasionally those files contain characters other than letter / numbers /  routine punctuation marks.  For example, a bullet (option-8 on a Mac) triggers the problem.
>
> Although R can read and process those characters, the other program cannot so I would like to identify these characters and exit gracefully with a warning.
>
> I discovered that the following works:
>        any(is.na(strsplit(readLines(FILE), "")))
>
> I am wondering whether anyone has a better approach to this problem.
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list