[R] Diagnostic and helper functions for defective & hard-to-import files

David Winsemius dwinsemius at comcast.net
Wed Jan 29 05:56:41 CET 2014


On Jan 28, 2014, at 8:43 PM, andrewH wrote:

> Hi Folks!
> I have been writing a small set of utilities for dealing with files that are
> hard to open correctly for one reason or another, especially because they
> are too big for memory, non-rectangular, or contain odd characters or
> unexpected codings, or all of these things together. Today it suddenly hit
> me that this has probably been done, done better, and upgraded to package
> form a dozen times already. There were pointers to a couple functions useful
> in this regard in the Core Import/Export document.  But my effort to come up
> with search terms that were productive of such packages was unsuccessful. 

I don't know of a package to do that. You know the quote from that Russian author whose name I am forgetting (in "Anna Karinena" perhaps) about happy families being all the same but unhappy families being impossible to classify. I think it applies to datasets as well. There are too many different dataset pathologies to allow a neat packaging approach. 

My approach has been to study the options in read.table very carefully and if that isinsufficient look ar either readLines or scan as options. It is very useful to be able to use `count.fields` with different parameter settings of "quotes" and comment.char". Wrapping it in table() can deliver a very compact, useful result.

And don't forget to search the Archives if you have a regular but non-rectangular arrangement


> 
> I would be grateful if someone would point me toward such a package or
> packages if they exist.

-- 

David Winsemius
Alameda, CA, USA




More information about the R-help mailing list