read.table bug (PR#1809)
Martin.Schlather@uni-bayreuth.de
Martin.Schlather@uni-bayreuth.de
Sat, 27 Jul 2002 15:19:54 +0200
This is a multi-part message in MIME format.
--------------A9A42E98BFD6B3F1F8A6F08A
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
I've met similar problems, independently.
The bug tracking list does not tell that is already done.
I've attached a patch file and the modified read.table.R.
Hope it helps. However, it is not intensively tested.
Cheers,
Martin
mmcstephen@cogstate.com wrote:
>
> Full_Name: Michael McStephen
> Version: 1.5.1
> OS: Win2000
> Submission from: (NULL) (203.25.148.63)
>
> When using read.table to read a data file section by section, I get an error
> that indicates R is attempting to read more "fields" than exist.
>
> The Data file looks like this:
>
> ---Start of file tmp.txt-----------
> Response Types = TPos TNeg FPos FNeg Anti Post MaxO
>
> Test ID,Username,Start Date,Mean,Std
> Dev,Accuracy,SRT1_1,SRT1_1_TYPE,SRT1_2,SRT1_2_TYPE,SRT1_3,SRT1_3_TYPE,SRT1_4,SRT1_4_TYPE,SRT1_5,SRT1_5_TYPE,SRT1_6,SRT1_6_TYPE,SRT1_7,SRT1_7_TYPE,SRT1_8,SRT1_8_TYPE,SRT1_9,SRT1_9_TYPE,SRT1_10,SRT1_10_TYPE,SRT1_11,SRT1_11_TYPE,SRT1_12,SRT1_12_TYPE,SRT1_13,SRT1_13_TYPE,SRT1_14,SRT1_14_TYPE,SRT1_15,SRT1_15_TYPE,SRT1_16,SRT1_16_TYPE,SRT1_17,SRT1_17_TYPE,SRT1_18,SRT1_18_TYPE,SRT1_19,SRT1_19_TYPE,SRT1_20,SRT1_20_TYPE,SRT1_21,SRT1_21_TYPE,SRT1_22,SRT1_22_TYPE,SRT1_23,SRT1_23_TYPE,SRT1_24,SRT1_24_TYPE,SRT1_25,SRT1_25_TYPE,
> 12247,ess40,4/18/02,2.480833271780652,0.152415692862802,100.0,299,1,371,1,339,1,250,1,308,1,227,1,287,1,237,1,281,1,259,1,384,1,272,1,361,1,1307,1,259,1,236,1,280,1,248,1,241,1,231,1,270,1,450,1,261,1,293,1,298,1,
>
> Test ID,Username,Start Date,Mean,Std
> Dev,Accuracy,SRT2_1,SRT2_1_TYPE,SRT2_2,SRT2_2_TYPE,SRT2_3,SRT2_3_TYPE,SRT2_4,SRT2_4_TYPE,SRT2_5,SRT2_5_TYPE,SRT2_6,SRT2_6_TYPE,SRT2_7,SRT2_7_TYPE,SRT2_8,SRT2_8_TYPE,SRT2_9,SRT2_9_TYPE,SRT2_10,SRT2_10_TYPE,SRT2_11,SRT2_11_TYPE,SRT2_12,SRT2_12_TYPE,SRT2_13,SRT2_13_TYPE,SRT2_14,SRT2_14_TYPE,SRT2_15,SRT2_15_TYPE,SRT2_16,SRT2_16_TYPE,SRT2_17,SRT2_17_TYPE,SRT2_18,SRT2_18_TYPE,SRT2_19,SRT2_19_TYPE,SRT2_20,SRT2_20_TYPE,SRT2_21,SRT2_21_TYPE,SRT2_22,SRT2_22_TYPE,SRT2_23,SRT2_23_TYPE,SRT2_24,SRT2_24_TYPE,SRT2_25,SRT2_25_TYPE,SRT2_26,SRT2_26_TYPE,SRT2_27,SRT2_27_TYPE,
> 12247,ess40,4/18/02,2.547430074367148,0.2068384905939905,92.0,470,1,253,1,368,1,325,1,249,1,272,1,275,1,275,1,355,1,485,1,306,1,1334,5,416,1,392,1,212,1,1531,5,621,1,331,1,314,1,277,1,336,1,393,1,232,1,243,1,252,1,245,1,271,1,
>
> Test ID,Username,Start Date,Mean,Std
> Dev,Accuracy,SRT3_1,SRT3_1_TYPE,SRT3_2,SRT3_2_TYPE,SRT3_3,SRT3_3_TYPE,SRT3_4,SRT3_4_TYPE,SRT3_5,SRT3_5_TYPE,SRT3_6,SRT3_6_TYPE,SRT3_7,SRT3_7_TYPE,SRT3_8,SRT3_8_TYPE,SRT3_9,SRT3_9_TYPE,SRT3_10,SRT3_10_TYPE,SRT3_11,SRT3_11_TYPE,SRT3_12,SRT3_12_TYPE,SRT3_13,SRT3_13_TYPE,SRT3_14,SRT3_14_TYPE,SRT3_15,SRT3_15_TYPE,SRT3_16,SRT3_16_TYPE,SRT3_17,SRT3_17_TYPE,SRT3_18,SRT3_18_TYPE,SRT3_19,SRT3_19_TYPE,SRT3_20,SRT3_20_TYPE,SRT3_21,SRT3_21_TYPE,SRT3_22,SRT3_22_TYPE,SRT3_23,SRT3_23_TYPE,SRT3_24,SRT3_24_TYPE,SRT3_25,SRT3_25_TYPE,SRT3_26,SRT3_26_TYPE,
> 12247,ess40,4/18/02,2.494412530422871,0.17876797593577065,96.0,355,1,291,1,331,1,276,1,279,1,278,1,454,1,275,1,272,1,249,1,249,1,221,1,235,1,299,1,240,1,261,1,241,1,249,1,321,1,329,1,234,1,250,1,278,1,442,1,1583,5,662,1,
>
> ...
>
> eof-----
>
> I have an array of skip values, and the number of lines to read after the skip
> values:
>
> skip.lines<-4 #number of lines between task headers
> skip.1<-2 #number of lines to the first task header
> n.tasks<-10 #number of tasks in the file
> n.lines<-1 #number of lines per task (excluding header)
>
> tmp<-seq(1,n.tasks)
> skip<- skip.1 + skip.lines * (tmp -1)
>
> # and the read.table is in a loop:
>
> for (i in 1:n.tasks) {
> i.dat<-read.table("tmp.txt", sep=",", strip.white=TRUE, fill=TRUE,
> skip=skip[i], nrows=n.lines, header=FALSE)
> i.name<-substr(names(i.dat)[7],1,4)
>
> more stuff here
> }
>
> When only the first block (first 5 rows) of the file are used, there is no
> problem. When the subsequent blocks are included, R appears to be looking for
> more "fields" in the rows 3 and 4 than actually exist. Changing to header=FALSE
> allows me to see all the data, but then I have to transform from character to
> numeric and I don't get the names in the data frame.
>
> This script works if the data file has more than one row of data below the
> header. (Just cut and paste a data row)
>
> Regards,
>
> michael.
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
--
Martin Schlather email: Martin.Schlather@uni-bayreuth.de
Abteilung Bodenphysik phone: +49 (0)921 55 2193
Univ. Bayreuth Fax : +49 (0)921 55 2246
D -- 95440 Bayreuth, Germany http://www.geo.uni-bayreuth.de/~martin/
--------------A9A42E98BFD6B3F1F8A6F08A
Content-Type: text/plain; charset=us-ascii;
name="read.table.R"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="read.table.R"
count.fields <- function(file, sep = "", quote = "\"'", skip = 0,
blank.lines.skip = TRUE, comment.char = "#")
{
if(is.character(file)) {
file <- file(file)
on.exit(close(file))
}
if(!inherits(file, "connection"))
stop("argument `file' must be a character string or connection")
.Internal(count.fields(file, sep, quote, skip, blank.lines.skip,
comment.char))
}
type.convert <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
.Internal(type.convert(x, na.strings, as.is, dec))
read.table <-
function (file, header = FALSE, sep = "", quote = "\"'", dec = ".",
row.names, col.names, as.is = FALSE,
na.strings = "NA", colClasses = NA,
nrows = -1, skip = 0,
check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#")
{
if(is.character(file)) {
file <- file(file, "r")
on.exit(close(file))
}
if(!inherits(file, "connection"))
stop("argument `file' must be a character string or connection")
if(!isOpen(file)) {
open(file, "r")
on.exit(close(file))
}
if(skip > 0) readLines(file, skip)
## read a few lines to determine header, no of cols.
# nlines <- 0
# lines <- NULL
# while(nlines < 5) {
# ## read up to five non-blank and non-comment lines.
# line <- readLines(file, 1, ok = TRUE)
# if(length(line) == 0) break
# if(blank.lines.skip && length(grep("^[ \\t]*$", line))) next
# if(length(comment.char) && nchar(comment.char)) {
# pattern <- paste("^[ \\t]*", substring(comment.char,1,1),
# sep ="")
# if(length(grep(pattern, line))) next
# }
# lines <- c(lines, line)
# }
nlines <-
if (nrows<0) 5
else min(5, (header + nrows))
lines <- .Internal(readTableHead(file, nlines, comment.char,
blank.lines.skip))
nlines <- length(lines)
if(!nlines) {
if(missing(col.names))
stop("no lines available in input")
else {
tmp <- vector("list", length(col.names))
names(tmp) <- col.names
class(tmp) <- "data.frame"
return(tmp)
}
}
if(all(nchar(lines) == 0)) stop("empty beginning of file")
pushBack(c(lines, lines), file)
first <- scan(file, what = "", sep = sep, quote = quote,
nlines = 1, quiet = TRUE, skip = 0,
strip.white = TRUE,
blank.lines.skip = blank.lines.skip,
comment.char = comment.char)
col1 <- if(missing(col.names)) length(first) else length(col.names)
col <- numeric(nlines - 1)
if (nlines>1)
for (i in seq(along=col))
col[i] <- length(scan(file, what = "", sep = sep,
quote = quote,
nlines = 1, quiet = TRUE, skip = 0,
strip.white = strip.white,
blank.lines.skip = blank.lines.skip,
comment.char = comment.char))
cols <- max(col1, col)
## basic column counting and header determination;
## rlabp (logical) := it looks like we have column names
rlabp <- (cols - col1) == 1
if(rlabp && missing(header))
header <- TRUE
if(!header) rlabp <- FALSE
if (header) {
readLines(file, 1) # skip over header
if(missing(col.names)) col.names <- first
else if(length(first) != length(col.names))
warning("header and `col.names' are of different lengths")
} else if (missing(col.names))
col.names <- paste("V", 1:cols, sep = "")
if(length(col.names) + rlabp < cols)
stop("more columns than column names")
if(fill && length(col.names) > cols)
cols <- length(col.names)
if(!fill && cols > 0 && length(col.names) > cols)
stop("more column names than columns")
if(cols == 0) stop("first five rows are empty: giving up")
if(check.names) col.names <- make.names(col.names)
if (rlabp) col.names <- c("row.names", col.names)
if(length(colClasses) < cols) colClasses <- rep(colClasses, len=cols)
## set up for the scan of the file.
## we read unknown values as character strings and convert later.
what <- rep(list(""), cols)
names(what) <- col.names
colClasses[colClasses %in% c("real", "double")] <- "numeric"
known <- colClasses %in%
c("logical", "integer", "numeric", "complex", "character")
what[known] <- sapply(colClasses[known], do.call, list(0))
data <- scan(file = file, what = what, sep = sep, quote = quote,
dec = dec, nmax = nrows, skip = 0,
na.strings = na.strings, quiet = TRUE, fill = fill,
strip.white = strip.white,
blank.lines.skip = blank.lines.skip, multi.line = FALSE,
comment.char = comment.char)
nlines <- length(data[[1]])
## now we have the data;
## convert to numeric or factor variables
## (depending on the specified value of "as.is").
## we do this here so that columns match up
if(cols != length(data)) { # this should never happen
warning(paste("cols =", cols," != length(data) =", length(data)))
cols <- length(data)
}
if(is.logical(as.is)) {
as.is <- rep(as.is, length=cols)
} else if(is.numeric(as.is)) {
if(any(as.is < 1 | as.is > cols))
stop("invalid numeric as.is expression")
i <- rep(FALSE, cols)
i[as.is] <- TRUE
as.is <- i
} else if (length(as.is) != cols)
stop(paste("as.is has the wrong length",
length(as.is),"!= cols =", cols))
for (i in 1:cols) {
# if(known[i] || as.is[i]) next
if(known[i]) next
data[[i]] <-
if (!is.na(colClasses[i])) as(data[[i]], colClasses[i])
else type.convert(data[[i]], as.is = as.is[i], dec = dec)
}
## now determine row names
if (missing(row.names)) {
if (rlabp) {
row.names <- data[[1]]
data <- data[-1]
}
else row.names <- as.character(seq(len=nlines))
} else if (is.null(row.names)) {
row.names <- as.character(seq(len=nlines))
} else if (is.character(row.names)) {
if (length(row.names) == 1) {
rowvar <- (1:cols)[match(col.names, row.names, 0) == 1]
row.names <- data[[rowvar]]
data <- data[-rowvar]
}
} else if (is.numeric(row.names) && length(row.names) == 1) {
rlabp <- row.names
row.names <- data[[rlabp]]
data <- data[-rlabp]
} else stop("invalid row.names specification")
## this is extremely underhanded
## we should use the constructor function ...
## don't try this at home kids
class(data) <- "data.frame"
row.names(data) <- row.names
data
}
read.csv <-
function (file, header = TRUE, sep = ",", quote="\"", dec=".",
fill = TRUE, ...)
read.table(file = file, header = header, sep = sep,
quote = quote, dec = dec, fill = fill, ...)
read.csv2 <-
function (file, header = TRUE, sep = ";", quote="\"", dec=",",
fill = TRUE, ...)
read.table(file = file, header = header, sep = sep,
quote = quote, dec = dec, fill = fill, ...)
read.delim <-
function (file, header = TRUE, sep = "\t", quote="\"", dec=".",
fill = TRUE, ...)
read.table(file = file, header = header, sep = sep,
quote = quote, dec = dec, fill = fill, ...)
read.delim2 <-
function (file, header = TRUE, sep = "\t", quote="\"", dec=",",
fill = TRUE, ...)
read.table(file = file, header = header, sep = sep,
quote = quote, dec = dec, fill = fill, ...)
--------------A9A42E98BFD6B3F1F8A6F08A
Content-Type: text/plain; charset=us-ascii;
name="rt.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="rt.patch"
55c55,59
< lines <- .Internal(readTableHead(file, 5, comment.char,
---
> nlines <-
> if (nrows<0) 5
> else min(5, (header + nrows))
>
> lines <- .Internal(readTableHead(file, nlines, comment.char,
77c81,82
< for (i in seq(along=col))
---
> if (nlines>1)
> for (i in seq(along=col))
--------------A9A42E98BFD6B3F1F8A6F08A--
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._