[R] textConnections so slow!
Henrik Bengtsson
hb at maths.lth.se
Mon Nov 10 20:44:56 CET 2003
Hi. I haven't looked at the source code for textConnection(), but I am
confident that the authors have done a good job, which makes me believe
that you're running out of RAM-memory and that you're starting to swap.
>From ?textConnection:
"An input text connection is opened and the character vector is
copied at time the connection object is created, and `close'
destroys the copy."
Thus, in your code
lines <- readLines("myBigFile.txt")
data <- scan(textConnection(lines), sep = "\t")
you use approx. 2*object.size(lines) bytes (ignoring object.size(data)).
Try
lines <- readLines("myBigFile.txt")
lines <- textConnection(lines)
gc() # maybe it helps to call the garbage collector here?
data <- scan(lines, sep = "\t")
which should use approx object.size(lines) bytes. So if you're swapping,
then scan()-ing from a (temporary) file may do better.
Moreover and more of a general suggestion, when using scan() and
read.table() you can help R to save memory by specifying the 'what' and
'colClasses' arguments, respectively.
Could this be it?
Henrik Bengtsson
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mathieu Drapeau
> Sent: den 11 november 2003 01:05
> To: r-help at stat.math.ethz.ch
> Subject: [R] textConnections so slow!
>
>
> Is it normal that it takes a very long time to generate a connection
> object on a big character vector?
>
> This takes a very long time to process:
> lines <- readLines ("myBigFile.txt")
> data <- scan(textConnection(lines), sep = "\t")
>
> against this that is pretty short to process:
> data <- scan("myBigFile.txt", sep = "\t")
>
> Anyone has any clues how to efficiently do that because I
> need to use a
> textConnection on a big vector?
>
> Thank you,
> Mathieu
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailma> n/listinfo/r-help
>
>
More information about the R-help
mailing list