[R] Faster Printing Alternatives to 'cat'

jim holtman jholtman at gmail.com
Thu Jan 8 15:11:04 CET 2009


Here is one way of doing it.  To write out 1 million rows on my system
took 21 seconds.

> # create some data
> dataSize <- 1e6
> foo <- runif(dataSize)
> bar <- runif(dataSize)
> n <- 1000  # number of items to write out each time
> output <- file('/output.txt', 'w')
> # now split the indices into groups of 'n'
> index <- split(seq(length(foo)), cut(seq(length(foo)), length(foo) / n, labels=FALSE))
> my.stats(reset=TRUE)
stats (1) - Rgui : <0.0 0.0> 73738.9 : 185.1MB
> for (i in index){
+     write.table(cbind(foo[i], bar[i]), file=output, sep='\t',
col.names=FALSE, row.names=FALSE)
+ }
> close(output)
> my.stats('done')
done (1) - Rgui : <20.7 20.7> 73759.6 : 124.6MB
>>

On Thu, Jan 8, 2009 at 8:26 AM, Gundala Viswanath <gundalav at gmail.com> wrote:
> Dear Jim and Henrik,
>
>> What exactly is the problem you are trying to solve.
>> Is it going to be read by some other program?
>
> I  simply want to print the data out. Surely, this data
> will be manipulated (with Excel or other
> programming languages) by other people suit to their purpose.
>
> Typically the print out from the loop looks  like this:
>
> ATCGATCGATCGGGGGGGGGGGGGGGTTTGCGGG   10   11.992
> CCCCCCCCGGGCCATCGGTCAGGGAATTGACGGAA   2      0.222
> .....
> up to ~16 million lines.
>
>> How much physical memory do you have on your machine?
> 6GB
>
>>  Is there paging  occuring due to the size of the objects?
> Don't quite understand what do you mean by that
> So sorry for my lack of knowledge in R.
>
>>  Have you consider creating a  structure with 10,000 of the variables
>> each time through the loop and then writing them out?
>
> Never thought about that. Can you be specific how can this be achieved?
>
> - Gundala Viswanath
> Jakarta - Indonesia
>
>
>
> On Thu, Jan 8, 2009 at 10:10 PM, jim holtman <jholtman at gmail.com> wrote:
>> What exactly is the problem you are trying to solve.  What is going to
>> be done with the data?  Is it going to be read by some other program?
>> How much physical memory do you have on your machine?  Is there paging
>> occuring due to the size of the objects?  Have you consider creating a
>> structure with 10,000 of the variables each time through the loop and
>> then writing them out?  A lot will depend on how much free memory you
>> have.  I will also ask one of my favorite questions; "tell me what you
>> want to do, not how you want to do it".
>>
>> On Thu, Jan 8, 2009 at 6:12 AM, Gundala Viswanath <gundalav at gmail.com> wrote:
>>> Dear all,
>>>
>>> I found that printing with 'cat' is very slow.
>>>
>>> For example in my machine this snippet
>>>
>>> __BEGIN__
>>>
>>> # I need to resolve to use this type of loop.
>>> # because using write(), I need to create a matrix  which
>>> # consumes so much memory. Note that "foo, bar, qux" object
>>> # is already very large (>2Gb)
>>>
>>> for ( s in 1:length(x) ) {
>>>    cat(as.character(foo[s]),"\t",bar[s],"\t", qux[s],"\n")
>>> }
>>> __END__
>>>
>>> for "x" of size ~1.5million, takes more than 10 hours to print.
>>> On my Linux 1994.MHz AMD processor.
>>>
>>> Is there any faster alternatives to "cat" ?
>>>
>>>
>>> - Gundala Viswanath
>>> Jakarta - Indonesia
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list