[R] reading a text file with a stray carriage return

Jim Lemon jim at bitwrit.com.au
Thu Mar 8 11:03:41 CET 2007


jim holtman wrote:
> How do you define a carriage return in the middle of a line if a carriage
> return is also used to delimit a line?  One of the things you can do is to
> use 'count.fields' to determine the number of fields in each line.  For
> those lines that are not the right length, you could combine them together
> with a 'paste' command when you write them out.
> 
> On 3/7/07, Walter R. Paczkowski <dataanalytics at earthlink.net> wrote:
> 
>>
>>  Hi,
>>  I'm  hoping someone has a suggestion for handling a simple problem.  A
>>  client  gave  me a comma separated value file (call it x.csv) that has
>>  an  id  and name and address for about 25,000 people (25,000 records).
>>  I used read.table to read it, but then discovered that there are stray
>>  carriage returns on several records.  This plays havoc with read.table
>>  since it starts a new input line when it sees the carriage return.  In
>>  short, the read is all wrong.
>>  I thought I could write a simple function to parse a line and write it
>>  back  out,  character by character.  If a carriage return is found, it
>>  would  simply  be  ignored on the writing back out part.  But how do I
>>  identify a carriage return?  What is the code or symbol?  Is there any
>>  easier  way  to  rid the file of carriage returns in the middle of the
>>  input lines?
>>  Any help is appreciated.
>>  Walt Paczkowski
>>
Probably using Windows with a CR/LF newline. You can have carriage 
returns (Ctrl-M - ASCII 13) or line feeds (Ctrl-L - ASCII 10) embedded 
in lines. You can probably just write a function in C or something that 
reads characters, checks that it and the last character is not a CR/LF 
pair and throws out the second character if it is CR or LF or any other 
troublesome byte. (I once had to trace null characters that were 
embedded in files - they didn't show up on the display, but clobbered 
the file reads).

Jim



More information about the R-help mailing list