[R] read.table

Fri Feb 25 21:54:43 CET 2005

On 25-Feb-05 Sean Davis wrote:
> I have a commonly recurring problem and wondered if folks
> would share tips.  I routinely get tab-delimited text files
> that I need to read in.
>   In very many cases, I get:
> 
>  > a <- read.table('junk.txt.txt',header=T,skip=10,sep="\t")
> Error in scan(file = file, what = what, sep = sep, quote = quote,
> dec = dec,  :
>       line 67 did not have 88 elements
> 
> I am typically able to go through the file and find a single
> quote or something like that causing the problem, but with a
> recent set of files, I haven't been able to find such an issue.
> What can I do to get around this problem?  I can use perl, also....

Hi Sean,

This is only a shot in the dark, but your description has reminded
me of similar messes in files which have been exported from Excel.

What I have often done in such cases, to check (e.g.) the numbers
of fields in records (using 'awk' on Linux) is on the following
lines:

  cat filename | awk 'BEGIN{FS="\t"} {print NF}' | unique

In that case, if there are varying numbers of fields then
two or more different numbers will be printed instead of
the single value which it should be.

If you know how many fields to expect (e.g. 88), then you can
find the line numbers of offending records by something like

  cat filename | awk 'BEGIN{FS="\t"} {if(NF!=88){print NR}}'

In data files with a lot of records per line, doing it in
this kind of way is vastly superior to trying to spot the
problem by eye -- it's extemely difficult to count 88
tab-separated fields on screen!

Hoping this helps! If not, supply further details and we'll
see what we can think up.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 25-Feb-05                                       Time: 20:54:43
------------------------------ XFMail ------------------------------