read.table bug (PR#1809)

Sat, 27 Jul 2002 15:19:54 +0200

This is a multi-part message in MIME format.
--------------A9A42E98BFD6B3F1F8A6F08A
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I've met similar problems, independently.
The bug tracking list does not tell that is already done.

I've attached a patch file and the modified read.table.R.
Hope it helps. However, it is not intensively tested.

Cheers,
Martin

mmcstephen@cogstate.com wrote:
> 
> Full_Name: Michael McStephen
> Version: 1.5.1
> OS: Win2000
> Submission from: (NULL) (203.25.148.63)
> 
> When using read.table to read a data file section by section, I get an error
> that indicates R is attempting to read more "fields" than exist.
> 
> The Data file looks like this:
> 
> ---Start of file tmp.txt-----------
> Response Types = TPos TNeg FPos FNeg Anti Post MaxO
> 
> Test ID,Username,Start Date,Mean,Std
> Dev,Accuracy,SRT1_1,SRT1_1_TYPE,SRT1_2,SRT1_2_TYPE,SRT1_3,SRT1_3_TYPE,SRT1_4,SRT1_4_TYPE,SRT1_5,SRT1_5_TYPE,SRT1_6,SRT1_6_TYPE,SRT1_7,SRT1_7_TYPE,SRT1_8,SRT1_8_TYPE,SRT1_9,SRT1_9_TYPE,SRT1_10,SRT1_10_TYPE,SRT1_11,SRT1_11_TYPE,SRT1_12,SRT1_12_TYPE,SRT1_13,SRT1_13_TYPE,SRT1_14,SRT1_14_TYPE,SRT1_15,SRT1_15_TYPE,SRT1_16,SRT1_16_TYPE,SRT1_17,SRT1_17_TYPE,SRT1_18,SRT1_18_TYPE,SRT1_19,SRT1_19_TYPE,SRT1_20,SRT1_20_TYPE,SRT1_21,SRT1_21_TYPE,SRT1_22,SRT1_22_TYPE,SRT1_23,SRT1_23_TYPE,SRT1_24,SRT1_24_TYPE,SRT1_25,SRT1_25_TYPE,
> 12247,ess40,4/18/02,2.480833271780652,0.152415692862802,100.0,299,1,371,1,339,1,250,1,308,1,227,1,287,1,237,1,281,1,259,1,384,1,272,1,361,1,1307,1,259,1,236,1,280,1,248,1,241,1,231,1,270,1,450,1,261,1,293,1,298,1,
> 
> Test ID,Username,Start Date,Mean,Std
> Dev,Accuracy,SRT2_1,SRT2_1_TYPE,SRT2_2,SRT2_2_TYPE,SRT2_3,SRT2_3_TYPE,SRT2_4,SRT2_4_TYPE,SRT2_5,SRT2_5_TYPE,SRT2_6,SRT2_6_TYPE,SRT2_7,SRT2_7_TYPE,SRT2_8,SRT2_8_TYPE,SRT2_9,SRT2_9_TYPE,SRT2_10,SRT2_10_TYPE,SRT2_11,SRT2_11_TYPE,SRT2_12,SRT2_12_TYPE,SRT2_13,SRT2_13_TYPE,SRT2_14,SRT2_14_TYPE,SRT2_15,SRT2_15_TYPE,SRT2_16,SRT2_16_TYPE,SRT2_17,SRT2_17_TYPE,SRT2_18,SRT2_18_TYPE,SRT2_19,SRT2_19_TYPE,SRT2_20,SRT2_20_TYPE,SRT2_21,SRT2_21_TYPE,SRT2_22,SRT2_22_TYPE,SRT2_23,SRT2_23_TYPE,SRT2_24,SRT2_24_TYPE,SRT2_25,SRT2_25_TYPE,SRT2_26,SRT2_26_TYPE,SRT2_27,SRT2_27_TYPE,
> 12247,ess40,4/18/02,2.547430074367148,0.2068384905939905,92.0,470,1,253,1,368,1,325,1,249,1,272,1,275,1,275,1,355,1,485,1,306,1,1334,5,416,1,392,1,212,1,1531,5,621,1,331,1,314,1,277,1,336,1,393,1,232,1,243,1,252,1,245,1,271,1,
> 
> Test ID,Username,Start Date,Mean,Std
> Dev,Accuracy,SRT3_1,SRT3_1_TYPE,SRT3_2,SRT3_2_TYPE,SRT3_3,SRT3_3_TYPE,SRT3_4,SRT3_4_TYPE,SRT3_5,SRT3_5_TYPE,SRT3_6,SRT3_6_TYPE,SRT3_7,SRT3_7_TYPE,SRT3_8,SRT3_8_TYPE,SRT3_9,SRT3_9_TYPE,SRT3_10,SRT3_10_TYPE,SRT3_11,SRT3_11_TYPE,SRT3_12,SRT3_12_TYPE,SRT3_13,SRT3_13_TYPE,SRT3_14,SRT3_14_TYPE,SRT3_15,SRT3_15_TYPE,SRT3_16,SRT3_16_TYPE,SRT3_17,SRT3_17_TYPE,SRT3_18,SRT3_18_TYPE,SRT3_19,SRT3_19_TYPE,SRT3_20,SRT3_20_TYPE,SRT3_21,SRT3_21_TYPE,SRT3_22,SRT3_22_TYPE,SRT3_23,SRT3_23_TYPE,SRT3_24,SRT3_24_TYPE,SRT3_25,SRT3_25_TYPE,SRT3_26,SRT3_26_TYPE,
> 12247,ess40,4/18/02,2.494412530422871,0.17876797593577065,96.0,355,1,291,1,331,1,276,1,279,1,278,1,454,1,275,1,272,1,249,1,249,1,221,1,235,1,299,1,240,1,261,1,241,1,249,1,321,1,329,1,234,1,250,1,278,1,442,1,1583,5,662,1,
> 
> ...
> 
> eof-----
> 
> I have an array of skip values, and the number of lines to read after the skip
> values:
> 
> skip.lines<-4  #number of lines between task headers
> skip.1<-2      #number of lines to the first task header
> n.tasks<-10     #number of tasks in the file
> n.lines<-1     #number of lines per task (excluding header)
> 
> tmp<-seq(1,n.tasks)
> skip<- skip.1 + skip.lines * (tmp -1)
> 
> # and the read.table is in a loop:
> 
> for (i in 1:n.tasks) {
>     i.dat<-read.table("tmp.txt",  sep=",", strip.white=TRUE, fill=TRUE,
> skip=skip[i], nrows=n.lines, header=FALSE)
>     i.name<-substr(names(i.dat)[7],1,4)
> 
>     more stuff here
> }
> 
> When only the first block (first 5 rows) of the file are used, there is no
> problem.  When the subsequent blocks are included, R appears to be looking for
> more "fields" in the rows 3 and 4 than actually exist.  Changing to header=FALSE
> allows me to see all the data, but then I have to transform from character to
> numeric and I don't get the names in the data frame.
> 
> This script works if the data file has more than one row of data below the
> header. (Just cut and paste a data row)
> 
> Regards,
> 
> michael.
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-- 
Martin Schlather                 email: Martin.Schlather@uni-bayreuth.de
Abteilung Bodenphysik            phone: +49 (0)921 55 2193          
Univ. Bayreuth                   Fax  : +49 (0)921 55 2246
D -- 95440 Bayreuth, Germany     http://www.geo.uni-bayreuth.de/~martin/
--------------A9A42E98BFD6B3F1F8A6F08A
Content-Type: text/plain; charset=us-ascii;
 name="read.table.R"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="read.table.R"

count.fields <- function(file, sep = "", quote = "\"'", skip = 0,
                         blank.lines.skip = TRUE, comment.char = "#")
{
    if(is.character(file)) {
        file <- file(file)
        on.exit(close(file))
    }
    if(!inherits(file, "connection"))
        stop("argument `file' must be a character string or connection")
    .Internal(count.fields(file, sep, quote, skip, blank.lines.skip,
                           comment.char))
}

type.convert <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
    .Internal(type.convert(x, na.strings, as.is, dec))

read.table <-
    function (file, header = FALSE, sep = "", quote = "\"'", dec = ".",
              row.names, col.names, as.is = FALSE,
	      na.strings = "NA", colClasses = NA,
              nrows = -1, skip = 0,
              check.names = TRUE, fill = !blank.lines.skip,
              strip.white = FALSE, blank.lines.skip = TRUE,
              comment.char = "#")
{
    if(is.character(file)) {
        file <- file(file, "r")
        on.exit(close(file))
    }
    if(!inherits(file, "connection"))
        stop("argument `file' must be a character string or connection")
    if(!isOpen(file)) {
        open(file, "r")
        on.exit(close(file))
    }

    if(skip > 0) readLines(file, skip)
    ## read a few lines to determine header, no of cols.
#    nlines <- 0
#    lines <- NULL
#     while(nlines < 5) {
#         ## read up to five non-blank and non-comment lines.
#         line <- readLines(file, 1, ok = TRUE)
#         if(length(line) == 0) break
#         if(blank.lines.skip && length(grep("^[ \\t]*$", line))) next
#         if(length(comment.char) && nchar(comment.char)) {
#             pattern <- paste("^[ \\t]*", substring(comment.char,1,1),
#                              sep ="")
#             if(length(grep(pattern, line))) next
#         }
#         lines <- c(lines, line)
#     }
    nlines <- 
      if (nrows<0) 5
      else min(5, (header + nrows))

    lines <- .Internal(readTableHead(file, nlines, comment.char,
                                     blank.lines.skip))
    nlines <- length(lines)
    if(!nlines) {
        if(missing(col.names))
            stop("no lines available in input")
        else {
            tmp <- vector("list", length(col.names))
            names(tmp) <- col.names
            class(tmp) <- "data.frame"
            return(tmp)
        }
    }
    if(all(nchar(lines) == 0)) stop("empty beginning of file")
    pushBack(c(lines, lines), file)
    first <- scan(file, what = "", sep = sep, quote = quote,
                  nlines = 1, quiet = TRUE, skip = 0,
                  strip.white = TRUE,
                  blank.lines.skip = blank.lines.skip,
                  comment.char = comment.char)
    col1 <- if(missing(col.names)) length(first) else length(col.names)
    col <- numeric(nlines - 1)
    if (nlines>1)
      for (i in seq(along=col))
        col[i] <- length(scan(file, what = "", sep = sep,
                              quote = quote,
                              nlines = 1, quiet = TRUE, skip = 0,
                              strip.white = strip.white,
                              blank.lines.skip = blank.lines.skip,
                              comment.char = comment.char))
    cols <- max(col1, col)

    ##	basic column counting and header determination;
    ##	rlabp (logical) := it looks like we have column names

    rlabp <- (cols - col1) == 1
    if(rlabp && missing(header))
	header <- TRUE
    if(!header) rlabp <- FALSE

    if (header) {
        readLines(file, 1) # skip over header
        if(missing(col.names)) col.names <- first
        else if(length(first) != length(col.names))
            warning("header and `col.names' are of different lengths")

    } else if (missing(col.names))
	col.names <- paste("V", 1:cols, sep = "")
    if(length(col.names) + rlabp < cols)
        stop("more columns than column names")
    if(fill && length(col.names) > cols)
        cols <- length(col.names)
    if(!fill && cols > 0 && length(col.names) > cols)
        stop("more column names than columns")
    if(cols == 0) stop("first five rows are empty: giving up")

    if(check.names) col.names <- make.names(col.names)
    if (rlabp) col.names <- c("row.names", col.names)

    if(length(colClasses) < cols) colClasses <- rep(colClasses, len=cols)

    ##	set up for the scan of the file.
    ##	we read unknown values as character strings and convert later.

    what <- rep(list(""), cols)
    names(what) <- col.names

    colClasses[colClasses %in% c("real", "double")] <- "numeric"
    known <- colClasses %in%
                c("logical", "integer", "numeric", "complex", "character")
    what[known] <- sapply(colClasses[known], do.call, list(0))

    data <- scan(file = file, what = what, sep = sep, quote = quote,
                 dec = dec, nmax = nrows, skip = 0,
		 na.strings = na.strings, quiet = TRUE, fill = fill,
                 strip.white = strip.white,
                 blank.lines.skip = blank.lines.skip, multi.line = FALSE,
                 comment.char = comment.char)

    nlines <- length(data[[1]])

    ##	now we have the data;
    ##	convert to numeric or factor variables
    ##	(depending on the specified value of "as.is").
    ##	we do this here so that columns match up

    if(cols != length(data)) { # this should never happen
	warning(paste("cols =", cols," != length(data) =", length(data)))
	cols <- length(data)
    }

    if(is.logical(as.is)) {
	as.is <- rep(as.is, length=cols)
    } else if(is.numeric(as.is)) {
	if(any(as.is < 1 | as.is > cols))
	    stop("invalid numeric as.is expression")
	i <- rep(FALSE, cols)
	i[as.is] <- TRUE
	as.is <- i
    } else if (length(as.is) != cols)
	stop(paste("as.is has the wrong length",
		   length(as.is),"!= cols =", cols))
    for (i in 1:cols) {
#        if(known[i] || as.is[i]) next
        if(known[i]) next
        data[[i]] <-
            if (!is.na(colClasses[i])) as(data[[i]], colClasses[i])
            else type.convert(data[[i]], as.is = as.is[i], dec = dec)
    }

    ##	now determine row names

    if (missing(row.names)) {
	if (rlabp) {
	    row.names <- data[[1]]
	    data <- data[-1]
	}
	else row.names <- as.character(seq(len=nlines))
    } else if (is.null(row.names)) {
	row.names <- as.character(seq(len=nlines))
    } else if (is.character(row.names)) {
	if (length(row.names) == 1) {
	    rowvar <- (1:cols)[match(col.names, row.names, 0) == 1]
	    row.names <- data[[rowvar]]
	    data <- data[-rowvar]
	}
    } else if (is.numeric(row.names) && length(row.names) == 1) {
	rlabp <- row.names
	row.names <- data[[rlabp]]
	data <- data[-rlabp]
    } else stop("invalid row.names specification")

    ##	this is extremely underhanded
    ##	we should use the constructor function ...
    ##	don't try this at home kids

    class(data) <- "data.frame"
    row.names(data) <- row.names
    data
}

read.csv <-
    function (file, header = TRUE, sep = ",", quote="\"", dec=".",
              fill = TRUE, ...)
    read.table(file = file, header = header, sep = sep,
               quote = quote, dec = dec, fill = fill, ...)

read.csv2 <-
    function (file, header = TRUE, sep = ";", quote="\"", dec=",",
              fill = TRUE, ...)
    read.table(file = file, header = header, sep = sep,
               quote = quote, dec = dec, fill = fill, ...)

read.delim <-
    function (file, header = TRUE, sep = "\t", quote="\"", dec=".",
              fill = TRUE, ...)
    read.table(file = file, header = header, sep = sep,
               quote = quote, dec = dec, fill = fill, ...)

read.delim2 <-
    function (file, header = TRUE, sep = "\t", quote="\"", dec=",",
              fill = TRUE, ...)
    read.table(file = file, header = header, sep = sep,
               quote = quote, dec = dec, fill = fill, ...)

--------------A9A42E98BFD6B3F1F8A6F08A
Content-Type: text/plain; charset=us-ascii;
 name="rt.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="rt.patch"

55c55,59
<     lines <- .Internal(readTableHead(file, 5, comment.char,
---
>     nlines <- 
>       if (nrows<0) 5
>       else min(5, (header + nrows))
>     
>     lines <- .Internal(readTableHead(file, nlines, comment.char,
77c81,82
<     for (i in seq(along=col))
---
>     if (nlines>1)
>       for (i in seq(along=col))

--------------A9A42E98BFD6B3F1F8A6F08A--

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._