[R] textConnection appears to be slow

ripley@stats.ox.ac.uk ripley at stats.ox.ac.uk
Fri Jun 21 13:50:34 CEST 2002


On Fri, 21 Jun 2002 james.holtman at convergys.com wrote:

> I was trying to read in a file and delete lines that did not have the
> correct
> number of fields on them.  I was reading the file as one character vector
> per line
> using 'scan' with sep='\n'.  I was then using 'count.fields' with
> 'textConnection' to the object I just read in.
>
> I thought at first the system was locked up, but further testing showed
> that the
> 'textConnection' was a very slow way to read in data to 'count.fields' as
> compared to
> 'count.fields' just reading the file.
>
> Is this a characteristic of using 'textConnection' on large objects?

Yes, input from `textConnection' like this will be slow.  It's a
character-at-a-time process, allowing for pushbacks.  For large scratch
use, use a scratch file (via file()).

>
> ==============================================================
>
> > unix.time(x.1 <- scan('iostat.zigzag.020620', what='', sep='\n'))
> Read 117163 items
> [1] 4.00 0.07 4.08   NA   NA
> > str(x.1)
>  chr [1:117163] "000035 atf233       0.0    0.8    0.0    5.9  0.0  0.0
> 9.3   0   0 " ...
> #
> # count.fields just reading the file directly; this appears to work fine
> (<4 seconds)
> #
> > unix.time(x.2 <- count.fields('iostat.zigzag.020620'))
> [1] 3.35 0.04 3.39   NA   NA
> > str(x.2)
>  int [1:117163] 11 11 11 11 11 11 11 11 11 11 ...
> > sum(x.2 != 11)    # determine number of 'bad' records
> [1] 3
> #
> # processing times get longer with larger objects
> #
> > unix.time(x.3 <- count.fields(textConnection(x.1[1:3000])))
> [1] 0.94 0.00 0.94   NA   NA
> > unix.time(x.3 <- count.fields(textConnection(x.1[1:7000])))
> [1] 13.61  0.02 13.64    NA    NA
> > unix.time(x.3 <- count.fields(textConnection(x.1[1:10000])))
> [1] 31.61  0.00 31.75    NA    NA
> >
>
>
> platform "i386-pc-mingw32"
> arch     "i386"
> os       "mingw32"
> system   "i386, mingw32"
> status   ""
> major    "1"
> minor    "5.1"
> year     "2002"
> month    "06"
> day      "17"
> language "R"
>
> --
>
> NOTICE:  The information contained in this electronic mail transmission is
> intended by Convergys Corporation for the use of the named individual or
> entity to which it is directed and may contain information that is
> privileged or otherwise confidential.  If you have received this electronic
> mail transmission in error, please delete it from your system without
> copying or forwarding it, and notify the sender of the error by reply email
> or by telephone (collect), so that the sender's address records can be
> corrected.
>
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list