[R] textConnection taking a long time to open a big string

Thomas W Blackwell tblackw at umich.edu
Wed Apr 30 21:23:08 CEST 2003


Two alternate ways to the same result:

x.1 <- scan(file=, what=rep(list(0),17), fill=T, multi.line=F)
incomplete.lines <- seq(length(x.1[[17]]))[ is.na(x.1[[17]] ]

x.1 <- scan(file=, what='')
x.2 <- strsplit(x.1, "[\\t ]")
incomplete.lines <- seq(length(x.1))[ unlist(lapply(x.2, length)) < 17 ]

Please read the help for these functions.

HTH  -  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Wed, 30 Apr 2003 james.holtman at convergys.com wrote:

> I was using 'textConnection' to read in a file with about 11,000 lines so I
> could detect lines with incomplete data and delete them and then read them
> in with 'scan'.  I am using 1.7.0 on Windows.  Here is the output from the
> script and it was using 51 seconds just to do the textConnection.
>
> Is there a limit on how large a text object can be to be used with
> 'textConnection'?
>
> ########   script output    ################
> > x.1 <- scan("/mpstat.ssgdbsv4.030430.txt",what='',sep='\n')
> Read 11299 items
> > str(x.1)
>  chr [1:11299] "8.3155  32   71   4 1907   122    0 1130  105  167  216
> 0  3686   32  13  37  18" ...
> > unix.time(x.in <- textConnection(x.1))  # this takes a long time
> [1] 51.96  0.01 53.20    NA    NA
> > sum(nchar(x.1))  # total number of characters in the vector
> [1] 944525
> > unix.time(x.c <- count.fields(x.in))    # this goes pretty fast
> [1] 0.14 0.00 0.14   NA   NA
> > table(x.c)      # detect incomplete lines
> x.c
>     3     6    17
>     1     1 11297



More information about the R-help mailing list