[R] write.table performance: an alternative?

Carlos Javier Gil Bellosta cjgb at wanadoo.es
Mon Sep 13 00:17:46 CEST 2004


Dear R's,

I have been using R lately to perform some statistical analysis and, 
based on them, simulations to be exported in flat text files to other 
programs. These text files are nowadays of about 30MB in size, but they 
could finally be of up to 300MB.

Writing these files with either write.table or write.matrix was 
desperately slow and the bottleneck of the whole process. Besides, the 
it took too much memory and sometimes I experienced heavy paging. So I 
decided to find a better way to export my R tables. Since they contained 
floating numbers only, in order to avoid the internal transformation 
into character values (both write.table and write.matrix seem to be 
doing it), compiling

//////////////////////////     Program Start  //////////////////////////

#include <stdio.h>
#include <stdlib.h>

void salidaOptimizada(int* l_fila, int* n_columnas, double* 
vector_resultados){

       int i;
       int j;

       FILE* f = fopen("datosPorPeriodo.dat", "w");

       for(i=0; i < *n_columnas; i++){
               for(j=0; j < *l_fila; j++){
                       fprintf(f, " %3f", *vector_resultados);
                       vector_resultados++;
               }
               fprintf(f, "\n");
       }

       fclose(f);

}

////////////////////// Program End ////////////////////

as a shared library and linking it to my code, and invoking it with the 
.C function would do the trick for me. The performance gains were 
enormous respect to write.table().

So I decided to look for a greater degree of generality and wrote a 
simple C function (enclosed at the end of the message) that would accept 
character, integer and floating point values. It can be tested, for 
instance, running both

/////////////// Program Start /////////////////

a1 <- rnorm(1000000)
a2 <- floor(a1)
a3 <- as.character(1:1000000)
a <- data.frame(a1, a2, a3)
Rprof()

write.table(a, "salidaNoOptimizada.dat")

Rprof(NULL)
summaryRprof()

////////////////////// Program End /////////////////

and

///////////////////// Program Start /////////////////

dyn.load("liboptio.so")
a1 <- rnorm(1000000)
a2 <- floor(a1)
a3 <- as.character(1:1000000)
a <- data.frame(a1, a2, a3)
Rprof()

borrar <- .C("escribir", as.integer(1000000), as.character("dic"), 
as.integer(3), as.double(a[,1]), as.integer(a[,2]), as.character(a[,3]))

Rprof(NULL)
rm(borrar)
summaryRprof()

//////////////////// Program End ///////////////////

to compare the performance (given that the program below is compiled as 
a shared library under the libopio.so name).

Now, my question:

Is this interesting/useful at all for anybody other then myself? Have I 
done something silly (I know too little about both C and R) and wasted 
an afternoon? Or would it be worth trying to improve the code to improve 
generality and wrapping it in some R code so as to make the function 
invocation a bit more transparent and automatic?

Sincerely,

Carlos J. Gil Bellosta

///////////////////// Program Start /////////////////

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>

void escribir(int* n_lin, char** tipo, int* n, ...){

   int i, j;
   va_list lista;

   FILE* f = fopen("salidaPrueba", "w");
   for(i = 0; i < *n_lin; i++){
         char *pAchar = *tipo;

       va_start(lista, *n);
             for(j = 0; j < *n; j++){
                 if(*pAchar == 'd'){
               fprintf(f, " %f", *(va_arg(lista, double*) + i));
           } else

           if(*pAchar == 'i'){
               fprintf(f, " %d", *(va_arg(lista, int*) + i));
           } else
                     if(*pAchar == 'c'){
               fprintf(f, " %s", *(va_arg(lista, char**) + i));
           } else
               fprintf(f, "mierda %c", *pAchar);
                     pAchar++;                  }

       fprintf(f, "\n");
       va_end(lista);          }

   fclose(f);
}

///////////////////// Program End /////////////////




More information about the R-help mailing list