[R] (structured) programming style

Ross Boylan ross at biostat.ucsf.edu
Fri Sep 12 02:58:03 CEST 2003


Thanks for your response.  Is good to know that copying is lazy, but I
don't think that fully solves my problems.  See below.

On Thu, 2003-09-11 at 17:40, Thomas Lumley wrote:
.....
> 
> you see that unpacking b from a didn't result in a copy, and that b must
> just be a reference to a$m.  When b is modified it must be copied, but
> this is true whether or not it is in a list. What matters is whether there
> is another reference to it somewhere [actually, whether R thinks there
> *might* be another reference: we try to be a bit conservative about this].
> 

I'm thinking of a situation like
a <- array(0, dim=c(10000, 10))
and then I modify a one row at a time.
a[34,] <- newrow
So if I write directly to a, I just overwrite the row.
But if I make a copy, even a lazy one, when I change the row I have to
make a copy of the whole array (unless the laziness is really clever,
and your figures suggest that a single write causes the whole thing to
be copied).

Hmm, now that I think of it I suppose I could just return newrow from
the inner function... except I have inner functions that produce several
rows, with inner inner functions that do single rows...

> 
> Now, it is certainly possible that you could have a situation where
> assigning with <<- was really faster than passing back a list, by enough
> to matter.  I think this situation is unusual enough that there may not be
> a firm idea of `good R style', since it assumes that the objects are small
> enough to fit easily in memory but large enough that it's worth going to
> some effort to reduce copying. You might get more useful input from the
> Bioconductor list, where people tend to spend a lot of time doing
> computationally expensive things to medium-sized data sets.

Where and what is the Bioconductor list?

I suppose optimization is one traditional reason to break style
guidelines.




More information about the R-help mailing list