[R] Errors in data frames from read.table
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Jul 16 19:23:55 CEST 2007
On Mon, 16 Jul 2007, Pat Carroll wrote:
> Hello, all.
>
> I am working on a project with a large (~350Mb, about 5800 rows)
> insurance claims dataset. It was supplied in a tilde(~)-delimited
> format. I imported it into a data frame in R by setting memory.limit to
> maximum (4Gb) for my computer and using read.table.
>
> The resulting data frame had 10 bad rows. The errors appear due to
> read.table missing delimiter characters, with multiple data being
> imported into the same cell, then the remainder of the row and the next
> run together and garbled due to the reading frame shift (example: a
> single cell might contain: <datum>~ ~ <datum> ~<datum>, after which all
> the cells of the row and the next are wrong).
>
> To replicate, I tried the same import procedure on a smaller
> demographics data set from the same supplier- only about 1Mb, and got
> the same kinds of errors (5 bad rows in about 3500). I also imported as
> much of the file as Excel would hold and cross-checked, Excel did not
> produce the same errors but can't handle the entire file. I have used
> read.table on a number of other formats (mainly csv and tab-delimited)
> without such problems; so far it appears there's something different
> about these files that produces the errors but I can't see what it would
> be.
The usual cause is that the user forgot about quotes and comment
characters. Try quote="", comment.char=""
If that does not work, please follow the request in the footer of every
message on this list.
> Does anyone have any thoughts about what is going wrong? And is there a
> way, short of manual correction, for fixing it?
>
> Thanks for all help,
> ~Pat.
>
>
> Pat Carroll.
> what matters most is how well you walk through the fire.
> bukowski.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list