[Rd] read.fwf doesn't work with header = TRUE (PR#8226)
ripley@stats.ox.ac.uk
ripley at stats.ox.ac.uk
Sun Oct 23 18:59:11 CEST 2005
On Fri, 21 Oct 2005, Emmanuel Paradis wrote:
> Prof Brian Ripley wrote:
>> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:
>>
>>> Full_Name: Emmanuel Paradis
>>> Version: 2.1.1
>>> OS: Linux
>>> Submission from: (NULL) (193.49.41.105)
>>>
>>>
>>> read.fwf(..., header = TRUE) does not work properly since:
>>>
>>> 1/ the original header is printed on the console and not in FILE;
>>> 2/ the different 'parts' of the header should be separated with tabs
>>> to work with the call to read.table.
>>>
>>> Here is a suggested fix for src/library/utils/R/read.fwf.R:
>>>
>>> 38c38,40
>>> < cat(FILE, headerline, "\n")
>>> ---
>>>
>>>> headerline <- unlist(strsplit(headerline, " {1,}"))
>>>> headerline <- paste(headerline, collapse = "\t")
>>>> cat(file = FILE, headerline, "\n")
>>
>>
>> Thanks, but I don't think that is right. It assumes the header line is
>> space-delimited (or at least that spaces get converted to tabs). We have
>> not specified the format of the header line, and it cannot usefully be
>> fixed format. So I think we need to specify it is delimited by 'sep'
>> (not tab).
>
> I see, but suppose we read selectively some columns in a file, eg with
> widths=c(1, -4, 2), how can we know how many variables have been skipped and
> then select the appropriate names in the header line?
You do not: as the help file says
Negative-width fields are used to indicate columns to be skipped,
eg '-5' to skip 5 columns. These fields are not seen by
'read.table' and so should not be included in a 'col.names' or
'colClasses' argument.
> Here is another proposed fix, but this assumes the header line is in
> fixed-width format (as specified by 'widths'):
What happens if there are multi-line records? Your `fix' crashes.
> 38c38,41
> < cat(FILE, headerline, "\n")
> ---
>> head.last <- cumsum(widths)
>> head.first <- head.last - widths + 1
>> headerline <- substring(headerline, head.first, head.last)[drop]
>> cat(file = FILE, headerline, "\n", sep = sep)
>
> ?read.fwf says clearly that sep is used internally.
Not so: please check the current version.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list