[R] eliminating control characters from formatted data files

jim holtman jholtman at gmail.com
Thu Feb 5 14:52:45 CET 2009


Here is one way of doing it.  You can read it in as "raw" and then
either replace/delete the control character and write the file back
out:

> # read in as 'raw' and delete the control-Z from the string
> x <- readBin('/tempyy.txt', 'raw', n=100000)
> x
 [1] 54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 1a 2e 0d 0a 4d 4f 52
45 20 4f 46 20 54 48 45 20 44 41 54 45 4d
[37] 1a 1a 0d 0a 74 68 69 73 20 69 73 20 73 6f 6d 65 20 64 61 74 61 0d
0a 6c 61 73 74 20 6c 69 6e 65 0d 0a
> rawToChar(x)
[1] "This is a test \032.\r\nMORE OF THE DATEM\032\032\r\nthis is some
data\r\nlast line\r\n"
> # delete ^Z
> x <- x[x != as.raw(26)]
> x
 [1] 54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 2e 0d 0a 4d 4f 52 45
20 4f 46 20 54 48 45 20 44 41 54 45 4d 0d
[37] 0a 74 68 69 73 20 69 73 20 73 6f 6d 65 20 64 61 74 61 0d 0a 6c 61
73 74 20 6c 69 6e 65 0d 0a
> rawToChar(x)
[1] "This is a test .\r\nMORE OF THE DATEM\r\nthis is some
data\r\nlast line\r\n"
> # can now write out 'x'
>
>


On Thu, Feb 5, 2009 at 4:01 AM, David Epstein
<David.Epstein at warwick.ac.uk> wrote:
>
> I have a few hundred files of formatted data. Unfortunately most of them end
> with a spurious CONTROL-Z. I want to rewrite the files without the spurious
> character. Here's what I've come up with so far, but my code is unsafe
> because it assumes without justification that the last row of df contains a
> control character (and some NAs to fill up the record).
>
> options(warn=-1) #turn off irritating warning from read.table()
> df<-read.table(file=filename)
> df.new<-df[1:nrow(df)-1,]
> write.table(df.new,file=filename.new, quote=F)
>
> Before defining df.new, I want to check that the last line really does
> contain a control character. I've tried various methods, but none of them
> work.
>
> I have been wondering if I should use a function (scan?) that reads in the
> file line by line and checks each line for control characters, but I don't
> know how to do this either.
>
> Thanks for any help
> David
> --
> View this message in context: http://www.nabble.com/eliminating-control-characters-from-formatted-data-files-tp21847583p21847583.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list