[R] write.table performance: an alternative?
Carlos Javier Gil Bellosta
cjgb at wanadoo.es
Mon Sep 13 00:17:46 CEST 2004
Dear R's,
I have been using R lately to perform some statistical analysis and,
based on them, simulations to be exported in flat text files to other
programs. These text files are nowadays of about 30MB in size, but they
could finally be of up to 300MB.
Writing these files with either write.table or write.matrix was
desperately slow and the bottleneck of the whole process. Besides, the
it took too much memory and sometimes I experienced heavy paging. So I
decided to find a better way to export my R tables. Since they contained
floating numbers only, in order to avoid the internal transformation
into character values (both write.table and write.matrix seem to be
doing it), compiling
////////////////////////// Program Start //////////////////////////
#include <stdio.h>
#include <stdlib.h>
void salidaOptimizada(int* l_fila, int* n_columnas, double*
vector_resultados){
int i;
int j;
FILE* f = fopen("datosPorPeriodo.dat", "w");
for(i=0; i < *n_columnas; i++){
for(j=0; j < *l_fila; j++){
fprintf(f, " %3f", *vector_resultados);
vector_resultados++;
}
fprintf(f, "\n");
}
fclose(f);
}
////////////////////// Program End ////////////////////
as a shared library and linking it to my code, and invoking it with the
.C function would do the trick for me. The performance gains were
enormous respect to write.table().
So I decided to look for a greater degree of generality and wrote a
simple C function (enclosed at the end of the message) that would accept
character, integer and floating point values. It can be tested, for
instance, running both
/////////////// Program Start /////////////////
a1 <- rnorm(1000000)
a2 <- floor(a1)
a3 <- as.character(1:1000000)
a <- data.frame(a1, a2, a3)
Rprof()
write.table(a, "salidaNoOptimizada.dat")
Rprof(NULL)
summaryRprof()
////////////////////// Program End /////////////////
and
///////////////////// Program Start /////////////////
dyn.load("liboptio.so")
a1 <- rnorm(1000000)
a2 <- floor(a1)
a3 <- as.character(1:1000000)
a <- data.frame(a1, a2, a3)
Rprof()
borrar <- .C("escribir", as.integer(1000000), as.character("dic"),
as.integer(3), as.double(a[,1]), as.integer(a[,2]), as.character(a[,3]))
Rprof(NULL)
rm(borrar)
summaryRprof()
//////////////////// Program End ///////////////////
to compare the performance (given that the program below is compiled as
a shared library under the libopio.so name).
Now, my question:
Is this interesting/useful at all for anybody other then myself? Have I
done something silly (I know too little about both C and R) and wasted
an afternoon? Or would it be worth trying to improve the code to improve
generality and wrapping it in some R code so as to make the function
invocation a bit more transparent and automatic?
Sincerely,
Carlos J. Gil Bellosta
///////////////////// Program Start /////////////////
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
void escribir(int* n_lin, char** tipo, int* n, ...){
int i, j;
va_list lista;
FILE* f = fopen("salidaPrueba", "w");
for(i = 0; i < *n_lin; i++){
char *pAchar = *tipo;
va_start(lista, *n);
for(j = 0; j < *n; j++){
if(*pAchar == 'd'){
fprintf(f, " %f", *(va_arg(lista, double*) + i));
} else
if(*pAchar == 'i'){
fprintf(f, " %d", *(va_arg(lista, int*) + i));
} else
if(*pAchar == 'c'){
fprintf(f, " %s", *(va_arg(lista, char**) + i));
} else
fprintf(f, "mierda %c", *pAchar);
pAchar++; }
fprintf(f, "\n");
va_end(lista); }
fclose(f);
}
///////////////////// Program End /////////////////
More information about the R-help
mailing list