[R] read.table: fill=T for header?
Philipp Pagel
p.pagel at wzw.tum.de
Wed Apr 27 14:15:33 CEST 2011
Dear ExpeRts,t
I am trying to read tab delimted data produced by somewhat brain dead
software that seems to think it's a good idea to have an extra tab
character after the last column - except for the header line. As
explained in the help page, read.delim now assumes that the first
column contains the row.names (which is not even wrong) but now and all
col.names get shiftet by one column. Example:
infile <- 'sample\tx1\n1\tA\t\n2\tB\t\n3\tA\t'
read.delim(textConnection(infile))
sample x1
1 A NA
2 B NA
3 A NA
So I set row.names to NULL because the man page said "Using
‘row.names = NULL’ forces row numbering.". Now the row.names really
are numbered automatically but I get a "bonus column":
read.delim(textConnection(infile), row.names=NULL)
row.names sample x1
1 1 A NA
2 2 B NA
3 3 A NA
Hm - not what I want. I am also a bit puzzeled why the extra column is
introduced instead of just using the first col.name. At the moment I
deal with it by fixing the col.names and dumping the extra column:
dat <- read.delim(textConnection(infile), row.names=NULL)
colnames(dat) <- colnames(dat)[-1]
dat <- dat[-ncol(dat)]
dat
sample x1
1 1 A
2 2 B
3 3 A
I worked my way through ?read.delim but could not find an option to
deal with these (flawed) files directly. As the opposite situation
(i.e. more col.names than data) can be fixed with fill=T I was hoping
something like fill.header=T or fill='header' may exist. Did I just
not find it or does it not exist? And if it doesn't - does anyone
else think it would be a nice item for the wishlist?
cu
Philipp
--
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/
More information about the R-help
mailing list