[Rd] read.fwf doesn't work with header = TRUE (PR#8226)
Emmanuel.Paradis@mpl.ird.fr
Emmanuel.Paradis at mpl.ird.fr
Mon Oct 24 15:22:51 CEST 2005
Prof Brian Ripley wrote:
> On Fri, 21 Oct 2005, Emmanuel Paradis wrote:
>
>> Prof Brian Ripley wrote:
>>
>>> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:
>>>
>>>> Full_Name: Emmanuel Paradis
>>>> Version: 2.1.1
>>>> OS: Linux
>>>> Submission from: (NULL) (193.49.41.105)
>>>>
>>>>
>>>> read.fwf(..., header = TRUE) does not work properly since:
>>>>
>>>> 1/ the original header is printed on the console and not in FILE;
>>>> 2/ the different 'parts' of the header should be separated with tabs
>>>> to work with the call to read.table.
>>>>
>>>> Here is a suggested fix for src/library/utils/R/read.fwf.R:
>>>>
>>>> 38c38,40
>>>> < cat(FILE, headerline, "\n")
>>>> ---
>>>>
>>>>> headerline <- unlist(strsplit(headerline, " {1,}"))
>>>>> headerline <- paste(headerline, collapse = "\t")
>>>>> cat(file = FILE, headerline, "\n")
>>>
>>>
>>>
>>> Thanks, but I don't think that is right. It assumes the header line
>>> is space-delimited (or at least that spaces get converted to tabs).
>>> We have not specified the format of the header line, and it cannot
>>> usefully be fixed format. So I think we need to specify it is
>>> delimited by 'sep'
>>> (not tab).
>>
>>
>> I see, but suppose we read selectively some columns in a file, eg with
>> widths=c(1, -4, 2), how can we know how many variables have been
>> skipped and then select the appropriate names in the header line?
>
>
> You do not: as the help file says
>
> Negative-width fields are used to indicate columns to be skipped,
> eg '-5' to skip 5 columns. These fields are not seen by
> 'read.table' and so should not be included in a 'col.names' or
> 'colClasses' argument.
OK, but it is strange to me to not have all variables named in a header
line.
>> Here is another proposed fix, but this assumes the header line is in
>> fixed-width format (as specified by 'widths'):
>
>
> What happens if there are multi-line records? Your `fix' crashes.
It crashes anyway because it should be [!drop] and not [drop] ;)
>> 38c38,41
>> < cat(FILE, headerline, "\n")
>> ---
>>
>>> head.last <- cumsum(widths)
>>> head.first <- head.last - widths + 1
>>> headerline <- substring(headerline, head.first, head.last)[drop]
>>> cat(file = FILE, headerline, "\n", sep = sep)
>>
>>
>> ?read.fwf says clearly that sep is used internally.
>
>
> Not so: please check the current version.
Here is what I have in R 2.2.0:
sep: character; the separator used internally; should be a
character that does not occur in the file.
So, should the fix be simply:
38c38
< cat(FILE, headerline, "\n")
---
> cat(file = FILE, headerline, "\n")
?
More information about the R-devel
mailing list