[R] read.table: how to ignore errors?
William Dunlap
wdunlap at tibco.com
Tue Jan 24 22:47:39 CET 2012
> >> Oh, yeah, a reproducible example:
> >>
> >> read.csv from
> >> =====
> >> a,b
> >> 1,2
> >> 3,4
> >> 5,,6
> >> 7,8
> >> =====
> >> I want to be able to extract the data frame
> >> a b
> >> 1 1 1
The previous line should be '1 1 2', right?
> >> 2 3 4
> >> 3 7 8
Have you tried using count.fields to remove the lines
in the file with the wrong number of fields? E.g.,
> tf <- tempfile()
> cat(c("a,b", "1,2", "3,4", "5,,6", "7,8"), file=tf, sep="\n")
> # following will fail
> read.table(tf, sep=",", header=TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 3 elements
> # following reads lines with 2 fields only
> textLines <- readLines(tf)
> counts <- count.fields(textConnection(textLines), sep=",")
> read.table(text=textLines[counts == 2], header=TRUE, sep=",")
a b
1 1 2
2 3 4
3 7 8
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of R. Michael
> Weylandt
> Sent: Tuesday, January 24, 2012 1:33 PM
> To: sds at gnu.org
> Cc: r-help at r-project.org
> Subject: Re: [R] read.table: how to ignore errors?
>
> Given your domain name, you might also get some use out of the
> system() and system2() commands which allow the passing of strings to
> the OS command line (and thus the use of tools like grep/sed/awk
> within R)
>
> E.g., an idiom I use pretty frequently for interactive data analysis:
> (not really related, but I think it makes a good example)
>
> FunctionToAnalyzeSomething <- function(...){
> pdf("junk.pdf")
>
> # plot stuff
>
> dev.off()
> system(paste("open", getwd(), "junk.pdf", sep = " "))
> if(readline("Keep?") == "y") system("cp junk.pdf FileOutput.pdf")
> unlink("junk.pdf") # or system("rm junk.pdf")
> }
>
> I would imagine you could use tryCatch + as.character() to get the bad
> line number, and then make a temp file without that line with Unix
> tools, and read that in. Some sort of determined.read.table() wrapper
> to read.table()...
>
> Musing out loud...
> Michael
>
> On Tue, Jan 24, 2012 at 4:00 PM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
> > On 24/01/2012 3:45 PM, Sam Steingold wrote:
> >>
> >> I get this error from read.table():
> >> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> >> :
> >> line 234 did not have 8 elements
> >> The error is genuine (an extra field separator between 1st and 2nd
> >> element).
> >>
> >> 1. is there a way to see this bad line 234 from R without diving into the
> >> file?
> >
> >
> > You could use readLines. Skip 233 lines, read one.
> >
> >
> >> 2. is there a way to ignore the bad lines and get the data from the good
> >> lines only (I do want to see the bad lines, but I don't want to stop all
> >> work until some issue which causes 1% of data is resolved).
> >
> >
> > I think you would have to read the first part up to line 233, then read the
> > part after line 234, then use rbind to join the two parts. The latter might
> > be tricky if you need a header line; it may be easiest to rewrite the file
> > to a tempfile().
> >
> > Duncan Murdoch
> >
> >
> >> thanks.
> >>
> >> Oh, yeah, a reproducible example:
> >>
> >> read.csv from
> >> =====
> >> a,b
> >> 1,2
> >> 3,4
> >> 5,,6
> >> 7,8
> >> =====
> >> I want to be able to extract the data frame
> >> a b
> >> 1 1 1
> >> 2 3 4
> >> 3 7 8
> >>
> >> and a list of strings of length 1 containing "5,,6".
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list