[R] how to identify record with broken format

Boris Steipe bor|@@@te|pe @end|ng |rom utoronto@c@
Wed Jun 5 12:32:35 CEST 2019


I've seen that behaviour with a C" atom in a chemical structure.

Here is code to identify lines with an uneven number of quotation marks. Read your file with readLines() to use it.

myTxt    <- '"This" "is" "fine"'
myTxt[2] <- '"This" "is "not"'
myTxt[3] <- 'This is ok'
 
x <- lengths(regmatches(myTxt, gregexpr('\\"', myTxt)))  # (1)
which(x %% 2 == 1)
[1] 2


Cheers,
Boris


(1) credit to https://stackoverflow.com/questions/12427385/how-to-calculate-the-number-of-occurrence-of-a-given-character-in-each-row-of-a




> On 2019-06-05, at 06:12, Luigi Marongiu <marongiu.luigi using gmail.com> wrote:
> 
> Dear all,
> I have a large dataframe where one of the records in a column must
> have been wrongly formatted, in particular i think is missing a
> closing ".
> When I try to show only that column's value I get a [1] with plenty of
> empty space, the final record [45] and the system freezes. also, when
> i try to plot i get a table's printout instead of a real plot.
> 
> Is there a way to identify the record with the format? On a
> spreadsheet or text editor, all records seem OK; end there are too
> many records to visually inspect them all.
> 
> -- 
> Best regards,
> Luigi
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list