[R] Error with read.delim & read.csv
Peter Waltman
waltman at cs.nyu.edu
Thu Nov 15 18:11:56 CET 2007
Hi -
I'm reading in a tab delimited file that is causing issues with
read.delim. Specifically, for a specific set of lines, the last entry
of the line is misread and considered to be the first entry of a new row
(which is then padded with 'NA's' ). Specifically:
tmp <- read.delim( "trouble.txt", header=F )
produces a data.frame, tmp where if I call tmp[,1], I get output like:
[76] F45H7.4#2 C47C12.5#2 F40H7.4#2 ZK353.2
0.59
[81] Y116A8C.34 0.23 Y116F11A.MM 0.04
F26D12.A
I initially assumed it was a formatting issue with the file. However,
I've tried looking at the file in octal viewer, and the lines in
question seem fine. Additionally, using scan and then strsplit can
split the lines correctly (code below the sig).
Since I can't attach the file to a group posting, I can't give a sample
of the lines causing the issue, however, I can send a small sample to
anyone who's interested.
Note, I've tried this on several architectures and versions of R and get
the same behavior. Specifically, v.2.5.1 on an x86_64, as well as
v.2.6.0 on an x686 architecture. I also get similar behavior when I
convert the file into a comma-separated file and use read.csv.
As a quick workaround I can use scan & strsplit, but thought someone
might want to take a look at this problem.
Thanks,
Peter Waltman
p.s. the combination of scan & strsplit I describe above was as follows:
my.lines <- scan( "trouble.txt", sep="\n", what='character' )
split.lines <- strsplit( my.lines, "\t" )
num.entries <- sapply( split.lines, length )
after which num.lines will contain a equal number of entries as
my.lines, all containing 509 (the number of elt's per line).
More information about the R-help
mailing list