[R] how to identify record with broken format

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed Jun 5 12:23:26 CEST 2019


On 05/06/2019 6:12 a.m., Luigi Marongiu wrote:
> Dear all,
> I have a large dataframe where one of the records in a column must
> have been wrongly formatted, in particular i think is missing a
> closing ".
> When I try to show only that column's value I get a [1] with plenty of
> empty space, the final record [45] and the system freezes. also, when
> i try to plot i get a table's printout instead of a real plot.
> 
> Is there a way to identify the record with the format? On a
> spreadsheet or text editor, all records seem OK; end there are too
> many records to visually inspect them all.
> 

Without seeing the data it is hard to be specific, but the 
count.fields() function should normally return the same number of fields 
for every line.  You may need to specify some of its optional arguments, 
e.g. sep="," for a CSV file, etc.

For example, with this file:

1,2,3
1,2,"4"
1,2,"
1,2,5
1,2,"6"

I see

 > count.fields("~/temp/test.txt",sep=",")
[1]  3  3 NA NA NA  3

indicating that there are problems on lines 3-5 (a missing closing quote 
on line 3).

Duncan Murdoch



More information about the R-help mailing list