[R] reading a text file with a stray carriage return
Jim Lemon
jim at bitwrit.com.au
Thu Mar 8 11:03:41 CET 2007
jim holtman wrote:
> How do you define a carriage return in the middle of a line if a carriage
> return is also used to delimit a line? One of the things you can do is to
> use 'count.fields' to determine the number of fields in each line. For
> those lines that are not the right length, you could combine them together
> with a 'paste' command when you write them out.
>
> On 3/7/07, Walter R. Paczkowski <dataanalytics at earthlink.net> wrote:
>
>>
>> Hi,
>> I'm hoping someone has a suggestion for handling a simple problem. A
>> client gave me a comma separated value file (call it x.csv) that has
>> an id and name and address for about 25,000 people (25,000 records).
>> I used read.table to read it, but then discovered that there are stray
>> carriage returns on several records. This plays havoc with read.table
>> since it starts a new input line when it sees the carriage return. In
>> short, the read is all wrong.
>> I thought I could write a simple function to parse a line and write it
>> back out, character by character. If a carriage return is found, it
>> would simply be ignored on the writing back out part. But how do I
>> identify a carriage return? What is the code or symbol? Is there any
>> easier way to rid the file of carriage returns in the middle of the
>> input lines?
>> Any help is appreciated.
>> Walt Paczkowski
>>
Probably using Windows with a CR/LF newline. You can have carriage
returns (Ctrl-M - ASCII 13) or line feeds (Ctrl-L - ASCII 10) embedded
in lines. You can probably just write a function in C or something that
reads characters, checks that it and the last character is not a CR/LF
pair and throws out the second character if it is CR or LF or any other
troublesome byte. (I once had to trace null characters that were
embedded in files - they didn't show up on the display, but clobbered
the file reads).
Jim
More information about the R-help
mailing list