[R] read.fwf and header

Mon Oct 30 21:15:29 CET 2006

On Mon, 2006-10-30 at 19:51 +0100, Gregor Gorjanc wrote:
> Hi!
> 
> I have data (also in attached file) in the following form:
> 
> num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt
>  1                1   f q   1900-01-01 1900-01-01 01:01:01
>  2 1.0 1316666.5  2 a g r z            1900-01-01 01:01:01
>  3 1.5 1188830.5  3 b h s y 1900-01-01 1900-01-01 01:01:01
>  4 2.0 1271846.3  4 c i t x 1900-01-01 1900-01-01 01:01:01
>  5 2.5  829737.4    d j u w 1900-01-01
>  6 3.0 1240967.3  5 e k v v 1900-01-01 1900-01-01 01:01:01
>  7 3.5  919684.4  6 f l w u 1900-01-01 1900-01-01 01:01:01
>  8 4.0  968214.6  7 g m x t 1900-01-01 1900-01-01 01:01:01
>  9 4.5 1232076.4  8 h n y s 1900-01-01 1900-01-01 01:01:01
> 10 5.0 1141273.4  9 i o z r 1900-01-01 1900-01-01 01:01:01
>    5.5  988481.4 10 j     q 1900-01-01 1900-01-01 01:01:01
> 
> This is a FWF (fixed width format) file. I can not use read.table here,
> because of missing values. I have tried with the following
> 
> > read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
> header=TRUE)
> 
> Error in read.table(file = FILE, header = header, sep = sep, as.is =
> as.is,  :
> 	more columns than column names
> 
> I could use:
> 
> > read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
> header=FALSE, skip=1)
>    V1  V2        V3 V4 V5 V6 V7 V8          V9                 V10
> 1   1  NA        NA  1    f  q     1900-01-01  1900-01-01 01:01:01
> 2   2 1.0 1316666.5  2 a  g  r  z              1900-01-01 01:01:01
> 3   3 1.5 1188830.5  3 b  h  s  y  1900-01-01  1900-01-01 01:01:01
> 4   4 2.0 1271846.3  4 c  i  t  x  1900-01-01  1900-01-01 01:01:01
> 5   5 2.5  829737.4 NA d  j  u  w  1900-01-01
> 6   6 3.0 1240967.3  5 e  k  v  v  1900-01-01  1900-01-01 01:01:01
> 7   7 3.5  919684.4  6 f  l  w  u  1900-01-01  1900-01-01 01:01:01
> 8   8 4.0  968214.6  7 g  m  x  t  1900-01-01  1900-01-01 01:01:01
> 9   9 4.5 1232076.4  8 h  n  y  s  1900-01-01  1900-01-01 01:01:01
> 10 10 5.0 1141273.4  9 i  o  z  r  1900-01-01  1900-01-01 01:01:01
> 11 NA 5.5  988481.4 10 j        q  1900-01-01  1900-01-01 01:01:01
> 
> Does anyone have a clue, how to get above result with header?
> 
> Thanks!

The attachment did not come through. Perhaps it was too large?

Not sure if this is the most efficient way, but how about this:

DF <- read.fwf("test.txt", 
                widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
                skip = 1, strip.white = TRUE,
                col.names = read.table("test.txt", 
                                       nrow = 1, as.is = TRUE)[1, ])

> DF
   num1 num2      num3 int1 fac1 fac2 cha1 cha2       Date
1     1   NA        NA    1         f    q      1900-01-01
2     2  1.0 1316666.5    2    a    g    r    z           
3     3  1.5 1188830.5    3    b    h    s    y 1900-01-01
4     4  2.0 1271846.3    4    c    i    t    x 1900-01-01
5     5  2.5  829737.4   NA    d    j    u    w 1900-01-01
6     6  3.0 1240967.3    5    e    k    v    v 1900-01-01
7     7  3.5  919684.4    6    f    l    w    u 1900-01-01
8     8  4.0  968214.6    7    g    m    x    t 1900-01-01
9     9  4.5 1232076.4    8    h    n    y    s 1900-01-01
10   10  5.0 1141273.4    9    i    o    z    r 1900-01-01
11   NA  5.5  988481.4   10    j              q 1900-01-01
                POSIXt
1  1900-01-01 01:01:01
2  1900-01-01 01:01:01
3  1900-01-01 01:01:01
4  1900-01-01 01:01:01
5                 <NA>
6  1900-01-01 01:01:01
7  1900-01-01 01:01:01
8  1900-01-01 01:01:01
9  1900-01-01 01:01:01
10 1900-01-01 01:01:01
11 1900-01-01 01:01:01

Of course, with the limited number of columns, you can always just set 

colnames(DF) <- c("num1", "num2", "num3", "int1", "fac1", 
                  "fac2", "cha1", "cha2", "Date", "POSIXt")

as a post-import step.

HTH,

Marc Schwartz