[Rd] Discussion starter for package level Connection API

Mon Oct 9 23:40:14 CEST 2006

Thought I'd try and start a discussion. Feel free to jump in.

I guess R needs to strike the right balance between opening up the 
internals to package writers and not allowing them to do bad things. My 
first attempt at cracking this nut is to just memcpy() the Rconnection 
and not allow access to the private stuff:

/* Alternative to allowing C code access to connection API. */
Rconnection R_GetConnection(Rconnection ucon, int idx){
     Rconnection rcon;

     /* Valid connection? */
     if ((rcon = getConnection(idx)) == NULL)
         return NULL;

     memcpy(ucon,rcon,sizeof(struct Rconn));

     /* Don't reveal private data */
     ucon->private = NULL;

     return ucon;
}

This would take an user allocated Rconnection and fill out all structure 
members but deny access to the private data. It also presumes that the 
full Rconnection structure is available in an R_ext/ header file. This 
has the advantage of getting access to all the function pointers so that 
data can be pushed and pulled through the connection, without any 
knowledge of what's in the private area.

The first problem is with the class and description members. What to do? 
As it is, the user could do bad things like rename the class or 
description. If the function copied the strings, then the user would 
have to deallocate them as well.

Then there's the PushBack members, to which the user would have full 
access. Looks like they're only used in text connections. Would these be 
better off placed in the private member structure?

Also, the user has the capability to call close on the connection 
without updating the original isopen member.

Here's a rather restrictive approach whereby the user must know the 
integer index of the connection. Each function is a wrapper around the 
related Rconnection member.

int R_VfprintfConnection(int idx, const char *format, va_list ap){
     Rconnection con = getConnection(idx);

     if (!con) return -1; /* just like fprintf(3)? */

     if(!con->isopen) error(_("connection is not open"));
     if(!con->canwrite) error(_("cannot write to this connection"));

     return con->vfprintf(con,format,ap);
}

int R_FgetcConnection(int idx){
     Rconnection con = getConnection(idx);

     if (!con) return EOF; /* just like fgetc(3)? */

     if(!con->isopen) error(_("connection is not open"));
     if(!con->canread) error(_("cannot read from this connection"));

     return con->fgetc(con);
}

double R_SeekConnection(int idx, double where, int origin, int rw){
     Rconnection con = getConnection(idx);

     if (!con) return -1; /* just like fseek(3)? */
     if(!con->isopen) error(_("connection is not open"));
     if(!con->canseek) error(_("cannot seek on this connection"));

     return con->seek(con,where,origin,rw);
}

void R_TruncateConnection(int idx){
     Rconnection con = getConnection(idx);

     if (con) con->truncate(con);
}

int R_FlushConnection(int idx){
     Rconnection con = getConnection(idx);

     if (!con) return EOF; /* like fflush(3) */

     return con->fflush(con);
}

size_t R_ReadConnection(int idx, void *buf, size_t size, size_t n){
     Rconnection con = getConnection(idx);

     if (!con) return 0;

     if(!con->isopen) error(_("connection is not open"));
     if(!con->canread) error(_("cannot read from this connection"));

     return con->read(buf,size,n,con);
}

size_t R_WriteConnection(int idx, void *buf, size_t size, size_t n)
{
     Rconnection con = getConnection(idx);

     if (!con) return -1; /* just like write(2)? */

     if(!con->isopen) error(_("connection is not open"));
     if(!con->canwrite) error(_("cannot write to this connection"));

     return con->write(buf, size, n, con);
}

Thus, the user has no access to the Rconnection at all. Only question 
from me is whether there's too much overhead in calling getConnection(), 
especially when calling R_FgetcConnection() in a loop.

Jeff
-- 
http://biostat.mc.vanderbilt.edu/JeffreyHorner