[Rd] Wish: change behaviour of header in read.fwf (PR#9252)

gregor.gorjanc at bfro.uni-lj.si gregor.gorjanc at bfro.uni-lj.si
Tue Sep 26 02:01:41 CEST 2006


Hello!

In my opinion read.fwf()'s behaviour of header is not really useful. Say
I have the following data:

col1  col2  col3
 123   123   123
   a           b
1234    12  1234
      65.4   4.5

Now if I want to read this data into R I can not use read.table due to
missing fields.

read.table(file="test.txt")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings,  :
	line 3 did not have 3 elements

However, read.fwf() can help me.

read.fwf(file="test.txt", widths=c(5, 6, 5))
     V1     V2    V3
1 col1   col2   col3
2  123    123    123
3    a             b
4 1234     12   1234
5        65.4    4.5

Upps, I need to specify header and help page says that header fields
must be separated by sep. sep part of help page says

     sep: character; the separator used internally; should be a
          character that does not occur in the file (except in the
          header).

This is quite limiting because I never know in advance which characters
do not occur in a datafile and if I do, I have to  properly modify
header in the file before import. Naive use of read.fwf returns an error

read.fwf(file="test.txt", widths=c(5, 6, 5), header=TRUE, sep=" ")
Error in read.table(file = FILE, header = header, sep = sep, as.is =
as.is,  :
	more columns than column names

read.fwf(file="test.txt", widths=c(5, 6, 5), header=TRUE, sep="  ")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings,  :
	invalid 'sep' value: must be one byte

I get lost in reading source of read.fwf, but I think that the following
idea should be easy to implement and it would be also similar to
read.table behaviour.

<ideaCode>

if(header) {
  ## sep is from read.fwf call
  header <- unlist(strsplit(readLines(con=file, n=1), split=sep))
}
...
## tweaks related to issues with length(header), row.names, ncol(), ...
read.table(..., col.names=header, ...)

</ideaCode>

I know that FWF is not used much these days, but I would find proposed
change really useful.

-- 
Lep pozdrav / With regards,
    Gregor Gorjanc
----------------------------------------------------------------------
University of Ljubljana     PhD student
Biotechnical Faculty
Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si

SI-1230 Domzale             tel: +386 (0)1 72 17 861
Slovenia, Europe            fax: +386 (0)1 72 17 888

----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.




More information about the R-devel mailing list