[Rd] read.table() errors with tab as separator (PR#9061)

John.Maindonald at anu.edu.au John.Maindonald at anu.edu.au
Wed Jul 5 11:35:01 CEST 2006


(1) read.table(), with sep="\t", identifies 13 our of 1400 records,
in a file with 1400 records of 3 fields each, as having only 2 fields.
This happens under version 2.3.1 for Windows as well as with
R 2.3.1 for Mac OS X, and with R-devel under Mac OS X.
[R version 2.4.0 Under development (unstable) (2006-07-03 r38478)]

(2) Using read.table() with sep="\t", the first 1569 records only
of a 1821 record file are input.  The file has exactly two fields
in each record, and the minimum length of the second field is
1 character.  If however I extract lines 1561 to 1650 from the
file (the file "short.txt" below), all 90 lines are input.

 > webtwo <- "http://www.maths.anu.edu.au/~johnm/testfiles/twotabs.txt"
 > xy <- read.table(url(webtwo), sep="\t")
Warning message:
number of items read is not a multiple of the number of columns
 > z <- count.fields(url(webtwo), sep="\t")
 > table(z)
z
    2    3
   13 1387
 > table(sapply(strsplit(readLines(url(webtwo)), split="\t"), length))

    3
1400
 > readLines(url(webtwo))[z==2][9:13]  # last 5 as a sample (shorter  
lines)
[1] "865\tlinear model (lm)! Cook's distance\t152"
[2] "1019\tlinear model (lm)! Cook's distance\t177"
[3] "1048\tlinear model (lm)! Cook's distance\t183"
[4] "1082\tlinear model (lm)! Cook's distance\t187"
[5] "1220\tlinear model (lm)! Cook's distance\t214"
 > weblong <- "http://www.maths.anu.edu.au/~johnm/testfiles/long.txt"
 > webshort <- "http://www.maths.anu.edu.au/~johnm/testfiles/short.txt"
 > xyLong <- read.table(url(weblong), sep="\t")
 > dim(xyLong)    # Should be 1821 x 2
[1] 1569    2
 > xyShort <- read.table(url(webshort), sep="\t")
 > dim(xyShort)   # Should be, and will be, 90 x 2
[1] 90  2
 > long <- readLines(url(weblong))
 > short <- readLines(url(webshort))
 > length(long)
[1] 1821
 > length(short)
[1] 90
 > all(long[1561:1650]==short)  # short is lines 1561:1650 of long
[1] TRUE
 > ## Moreover strsplit() can pick up the \t's correctly
 > lsplit <- strsplit(long, "\t")
 > table(sapply(lsplit, length))

    2
1821
 > # Try also table(sapply(lsplit, function(x)x[2]))

--please do not edit the information below--

Version:
platform = powerpc-apple-darwin8.6.0
arch = powerpc
os = darwin8.6.0
system = powerpc, darwin8.6.0
status =
major = 2
minor = 3.1
year = 2006
month = 06
day = 01
svn rev = 38247
language = R
version.string = Version 2.3.1 (2006-06-01)

Locale:
C

Search Path:
.GlobalEnv, package:lattice, package:methods, package:stats,  
package:graphics, package:grDevices, package:utils, package:datasets,  
Autoloads, package:base



More information about the R-devel mailing list