[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

Mon Oct 24 15:22:51 CEST 2005

Prof Brian Ripley wrote:
> On Fri, 21 Oct 2005, Emmanuel Paradis wrote:
> 
>> Prof Brian Ripley wrote:
>>
>>> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:
>>>
>>>> Full_Name: Emmanuel Paradis
>>>> Version: 2.1.1
>>>> OS: Linux
>>>> Submission from: (NULL) (193.49.41.105)
>>>>
>>>>
>>>> read.fwf(..., header = TRUE) does not work properly since:
>>>>
>>>> 1/ the original header is printed on the console and not in FILE;
>>>> 2/ the different 'parts' of the header should be separated with tabs
>>>>   to work with the call to read.table.
>>>>
>>>> Here is a suggested fix for src/library/utils/R/read.fwf.R:
>>>>
>>>> 38c38,40
>>>> <         cat(FILE, headerline, "\n")
>>>> ---
>>>>
>>>>>         headerline <- unlist(strsplit(headerline, " {1,}"))
>>>>>         headerline <- paste(headerline, collapse = "\t")
>>>>>         cat(file = FILE, headerline, "\n")
>>>
>>>
>>>
>>> Thanks, but I don't think that is right.  It assumes the header line 
>>> is space-delimited (or at least that spaces get converted to tabs).  
>>> We have not specified the format of the header line, and it cannot 
>>> usefully be fixed format.  So I think we need to specify it is 
>>> delimited by 'sep'
>>> (not tab).
>>
>>
>> I see, but suppose we read selectively some columns in a file, eg with 
>> widths=c(1, -4, 2), how can we know how many variables have been 
>> skipped and then select the appropriate names in the header line?
> 
> 
> You do not: as the help file says
> 
>      Negative-width fields are used to indicate columns to be skipped,
>      eg '-5' to skip 5 columns.  These fields are not seen by
>      'read.table' and so should not be included in a 'col.names' or
>      'colClasses' argument.

OK, but it is strange to me to not have all variables named in a header 
line.

>> Here is another proposed fix, but this assumes the header line is in 
>> fixed-width format (as specified by 'widths'):
> 
> 
> What happens if there are multi-line records?  Your `fix' crashes.

It crashes anyway because it should be [!drop] and not [drop] ;)

>> 38c38,41
>> <         cat(FILE, headerline, "\n")
>> ---
>>
>>>         head.last <- cumsum(widths)
>>>         head.first <- head.last - widths + 1
>>>         headerline <- substring(headerline, head.first, head.last)[drop]
>>>         cat(file = FILE, headerline, "\n", sep = sep)
>>
>>
>> ?read.fwf says clearly that sep is used internally.
> 
> 
> Not so: please check the current version.

Here is what I have in R 2.2.0:

      sep: character; the separator used internally; should be a
           character that does not occur in the file.

So, should the fix be simply:

38c38
<         cat(FILE, headerline, "\n")
---
 >         cat(file = FILE, headerline, "\n")

?