[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

Sun Oct 23 18:59:11 CEST 2005

On Fri, 21 Oct 2005, Emmanuel Paradis wrote:

> Prof Brian Ripley wrote:
>> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:
>> 
>>> Full_Name: Emmanuel Paradis
>>> Version: 2.1.1
>>> OS: Linux
>>> Submission from: (NULL) (193.49.41.105)
>>> 
>>> 
>>> read.fwf(..., header = TRUE) does not work properly since:
>>> 
>>> 1/ the original header is printed on the console and not in FILE;
>>> 2/ the different 'parts' of the header should be separated with tabs
>>>   to work with the call to read.table.
>>> 
>>> Here is a suggested fix for src/library/utils/R/read.fwf.R:
>>> 
>>> 38c38,40
>>> <         cat(FILE, headerline, "\n")
>>> ---
>>> 
>>>>         headerline <- unlist(strsplit(headerline, " {1,}"))
>>>>         headerline <- paste(headerline, collapse = "\t")
>>>>         cat(file = FILE, headerline, "\n")
>> 
>> 
>> Thanks, but I don't think that is right.  It assumes the header line is 
>> space-delimited (or at least that spaces get converted to tabs).  We have 
>> not specified the format of the header line, and it cannot usefully be 
>> fixed format.  So I think we need to specify it is delimited by 'sep'
>> (not tab).
>
> I see, but suppose we read selectively some columns in a file, eg with 
> widths=c(1, -4, 2), how can we know how many variables have been skipped and 
> then select the appropriate names in the header line?

You do not: as the help file says

      Negative-width fields are used to indicate columns to be skipped,
      eg '-5' to skip 5 columns.  These fields are not seen by
      'read.table' and so should not be included in a 'col.names' or
      'colClasses' argument.

> Here is another proposed fix, but this assumes the header line is in 
> fixed-width format (as specified by 'widths'):

What happens if there are multi-line records?  Your `fix' crashes.

> 38c38,41
> <         cat(FILE, headerline, "\n")
> ---
>>         head.last <- cumsum(widths)
>>         head.first <- head.last - widths + 1
>>         headerline <- substring(headerline, head.first, head.last)[drop]
>>         cat(file = FILE, headerline, "\n", sep = sep)
>
> ?read.fwf says clearly that sep is used internally.

Not so: please check the current version.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595