[R] Tab Separated File Reading Error

William Dunlap wdunlap at tibco.com
Fri Oct 4 18:22:47 CEST 2013


> > annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE)
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>   line 5933 did not have 12 elements
> 
> However, all lines do have 12 columns.
> 
> > lines <- readLines("matched.txt")
> ...[many omitted lines]...
> The line does not contain comment or quote characters. What can you suggest ?

I suggest looking at the lines preceding the one where the error was found, with both
print and cat:
    print(lines[5933 - (10:0)])
    cat(lines[5933 - (10:0)], sep="\n")

If things are not obvious after looking at them, see if read.table can read just those lines
    read.table(text=lines[5933 - (10:0)], sep="\t", stringsAsFactors=FALSE)
If it can, try backing up more than 10 lines.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Dario Strbenac
> Sent: Friday, October 04, 2013 5:01 AM
> To: r-help at r-project.org
> Subject: [R] Tab Separated File Reading Error
> 
> Hello,
> 
> I have a seemingly simple problem that a tab-delimited file can't be read in.
> 
> > annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE)
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>   line 5933 did not have 12 elements
> 
> However, all lines do have 12 columns.
> 
> > lines <- readLines("matched.txt")
> > tabsPosns <- gregexpr("\t", lines)
> > table(sapply(tabsPosns, length))
> 
>     11
> 367274
> 
> > system("wc -l matched.txt")
> 367274 matched.txt
> 
> You can obtain the file from
> https://dl.dropboxusercontent.com/u/37992150/matched.txt
> 
> The line does not contain comment or quote characters. What can you suggest ?
> 
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> [7] base
> 
> loaded via a namespace (and not attached):
> [1] tools_3.0.1
> 
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list