[R] efficiently replacing values in a matrix
Joerg van den Hoff
j.van_den_hoff at fzd.de
Thu Apr 17 13:41:38 CEST 2008
On Wed, Apr 16, 2008 at 03:56:26PM -0600, Matthew Keller wrote:
> Yes Chuck, you're right.
>
just a comment:
> Thanks for the help. It was a data.frame not a matrix (I had called
> as.matrix() in my script much earlier but that line of code didn't run
> because I misnamed the object!). My bad. Thanks for the help. And I'm
> VERY relieved R isn't that inefficient...
well, it _is_ at least when using data frames. and while it
is obvious that operations on lists (data frames are lists
in disguise, actually, right?) are slower than on
arrays/matrices, I'm not happy with a performance drop by a
factor of about seemlingy > 1500 (30 sec vs. > 13 h) -- and
I have seen similar things even with rather small data sets,
where the difference of using data frame vs. matrix might
mean, e.g. overall run times of 10 sec. vs. 0.1 sec.
where is all this time burned? there _are_ functional
languages which operate efficiently on lists.
I think these extreme performance drop when using an
apparently innocent data structure is really bad. and it's
bad, that it's not repeatedly stated in BIG LETTERS in the
manuals: use matrices, at least for big arrays, whereever
possible. this message is not at all tranferred by the
"description" in data.frame manpage, e.g.:
"This function creates data frames, tightly coupled
collections of variables which share many of the properties
of matrices and of lists, used as the fundamental data
structure by most of R's modeling software."...
probably 90% (+ x) of all R users are simply that: users and
not experts. when I started using R I exclusively used data
frames for purely numerical data instead of matrices simply
because I could get column n with x[n] instead of x[,n] and
mean(x) worked columnwise (whereas apply(x, 2, 'mean') is
tiresome) thus saving some typing. this is no strong reason
in retrospect but probably quite common. and many then will
stick with data.frames and endure long runtimes for now good
reason at all.
another question would be whether homogeneous data frames
could not internally be handled as matrices...
joerg
>
> Matt
>
>
> On Wed, Apr 16, 2008 at 3:39 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
> >
> > On 17/04/2008, at 9:33 AM, Charles C. Berry wrote:
> >
> > <snip>
> >
> >
> >
> > > I'll lay odds that Matthew's 'matrix' is actually a data.frame, and I'll
> > not be surprised if the columns are factors.
> > >
> >
> > <snip>
> >
> > I suspect that you're right.
> >
> > ***Why*** can't people distinguish between data frames and matrices?
> > If they were the same <expletive deleted> thing, there wouldn't be two
> > different terms for them, would there?
> >
> > cheers,
> >
> > Rolf Turner
> >
> > ######################################################################
> > Attention:This e-mail message is privileged and confidential. If you are
> > not theintended recipient please delete the message and notify the
> > sender.Any views or opinions presented are solely those of the author.
> >
> >
> >
> > This e-mail has been scanned and cleared by
> > MailMarshalwww.marshalsoftware.com
> > ######################################################################
> >
>
>
>
> --
> Matthew C Keller
> Asst. Professor of Psychology
> University of Colorado at Boulder
> www.matthewckeller.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list