[Rd] read.table() errors with tab as separator (PR#9061)
John.Maindonald at anu.edu.au
John.Maindonald at anu.edu.au
Wed Jul 5 11:35:01 CEST 2006
(1) read.table(), with sep="\t", identifies 13 our of 1400 records,
in a file with 1400 records of 3 fields each, as having only 2 fields.
This happens under version 2.3.1 for Windows as well as with
R 2.3.1 for Mac OS X, and with R-devel under Mac OS X.
[R version 2.4.0 Under development (unstable) (2006-07-03 r38478)]
(2) Using read.table() with sep="\t", the first 1569 records only
of a 1821 record file are input. The file has exactly two fields
in each record, and the minimum length of the second field is
1 character. If however I extract lines 1561 to 1650 from the
file (the file "short.txt" below), all 90 lines are input.
> webtwo <- "http://www.maths.anu.edu.au/~johnm/testfiles/twotabs.txt"
> xy <- read.table(url(webtwo), sep="\t")
Warning message:
number of items read is not a multiple of the number of columns
> z <- count.fields(url(webtwo), sep="\t")
> table(z)
z
2 3
13 1387
> table(sapply(strsplit(readLines(url(webtwo)), split="\t"), length))
3
1400
> readLines(url(webtwo))[z==2][9:13] # last 5 as a sample (shorter
lines)
[1] "865\tlinear model (lm)! Cook's distance\t152"
[2] "1019\tlinear model (lm)! Cook's distance\t177"
[3] "1048\tlinear model (lm)! Cook's distance\t183"
[4] "1082\tlinear model (lm)! Cook's distance\t187"
[5] "1220\tlinear model (lm)! Cook's distance\t214"
> weblong <- "http://www.maths.anu.edu.au/~johnm/testfiles/long.txt"
> webshort <- "http://www.maths.anu.edu.au/~johnm/testfiles/short.txt"
> xyLong <- read.table(url(weblong), sep="\t")
> dim(xyLong) # Should be 1821 x 2
[1] 1569 2
> xyShort <- read.table(url(webshort), sep="\t")
> dim(xyShort) # Should be, and will be, 90 x 2
[1] 90 2
> long <- readLines(url(weblong))
> short <- readLines(url(webshort))
> length(long)
[1] 1821
> length(short)
[1] 90
> all(long[1561:1650]==short) # short is lines 1561:1650 of long
[1] TRUE
> ## Moreover strsplit() can pick up the \t's correctly
> lsplit <- strsplit(long, "\t")
> table(sapply(lsplit, length))
2
1821
> # Try also table(sapply(lsplit, function(x)x[2]))
--please do not edit the information below--
Version:
platform = powerpc-apple-darwin8.6.0
arch = powerpc
os = darwin8.6.0
system = powerpc, darwin8.6.0
status =
major = 2
minor = 3.1
year = 2006
month = 06
day = 01
svn rev = 38247
language = R
version.string = Version 2.3.1 (2006-06-01)
Locale:
C
Search Path:
.GlobalEnv, package:lattice, package:methods, package:stats,
package:graphics, package:grDevices, package:utils, package:datasets,
Autoloads, package:base
More information about the R-devel
mailing list