[R] How to write efficient R code
Liaw, Andy
andy_liaw at merck.com
Wed Feb 18 14:25:18 CET 2004
Sorry about the typo. There should be a "-" in front of nrow(x); i.e.,
mdiff <- function(x) x[-1,] - x[-nrow(x),]
... and sapply() won't work, but lapply() will. So the whole thing looks
like:
> do.call("rbind",lapply(split(df[,-1], df$ID), function(x) x[-1,] -
x[-nrow(x),]))
V2 V3 V4
1.2 -0.1250197 0.6446575 -1.0504143
1.3 -0.4104924 0.5638618 2.4117082
2.5 -3.1917997 -1.8687987 -0.9026947
2.6 2.2405199 3.5321711 1.0417581
3.8 1.7029947 0.3666408 0.8117269
3.9 -1.6701011 -0.8246094 -0.9099002
4.11 0.5183960 1.1066630 1.0484818
4.12 0.3563826 -1.9202869 -3.5635572
5.14 2.2746317 2.9820733 -2.4086057
5.15 -2.5767889 -2.5492538 -0.3083154
However, looking at this, I can't imagine this being the most efficient way
to go about it. If the IDs are contiguous (i.e., data for the same ID are
in consecutive rows), then you can operate on the entire data and then throw
out the unwanted row of each ID:
> df.diff <- df[-1, -1] - df[-nrow(df), -1]
> del <- which(diff(as.numeric(df$ID)) != 0)
> del
[1] 3 6 9 12
> df.diff[-del,]
V2 V3 V4
2 -0.1250197 0.6446575 -1.0504143
3 -0.4104924 0.5638618 2.4117082
5 -3.1917997 -1.8687987 -0.9026947
6 2.2405199 3.5321711 1.0417581
8 1.7029947 0.3666408 0.8117269
9 -1.6701011 -0.8246094 -0.9099002
11 0.5183960 1.1066630 1.0484818
12 0.3563826 -1.9202869 -3.5635572
14 2.2746317 2.9820733 -2.4086057
15 -2.5767889 -2.5492538 -0.3083154
HTH,
Andy
> From: Sebastian Luque [mailto:sluque at mun.ca]
>
> Hi,
>
> This is exactly what I meant Andy, the stratifying variable can be
> thought of as a factor. However, I tried your code and I get
> the error:
> "Error in Ops.data.frame......- only defined for equally-sized data
> frames". What may be happening?
> The result of 'apply' functions, or 'split' or 'by' and the like give
> lists as results, with a names attribute that, in my case, would have
> the levels of the factor. How can one get the results back to a
> data.frame object, with the newly calculated variables? The
> indexing for
> lists is not as straight forward as for data frames.
>
> Thanks to both of you for showing me the power of indexing in
> R functions!
>
> Sebastian
>
>
> Liaw, Andy wrote:
>
> >I'm guessing what Sebatian want is to do the differencing by
> a stratifying
> >variable such as ID; e.g., the data may look like:
> >
> >df <- as.data.frame(cbind(ID=rep(1:5, each=3),
> x=matrix(rnorm(45), 15, 3))
> >
> >So using Tom's solution, one would do something like:
> >
> >mdiff <- function(x) x[-1,] - x[nrow(x),]
> >sapply(split(df[,-1], df[,1]), mdiff)
> >
> >There could well be more efficient ways!
> >
> >Andy
> >
> >
> >
> >>From: Tom Blackwell
> >>
> >>Sebastian -
> >>
> >>For successive differences within a single column 'x'
> >>
> >>differences <- c(NA, diff(x)),
> >>
> >>same as
> >>
> >>differences <- c(NA, x[-1] - x[-length(x)]).
> >>
> >>See help("diff"), help("Subscript"). The second version also
> >>works when x is a matrix or a data frame, except now the result
> >>is a matrix or data frame of the same size.
> >>
> >>x <- data.frame(matrix(rnorm(1e+5), 1e+4))
> >>dim(x) # 10000 10
> >>differences <- rbind(rep(NA, 10), x[-1, ] - x[-dim(x)[1], ])
> >>dim(differences) # 10000 10
> >>
> >>However, you write "I need to do this for all the subsets of data
> >>created by the numbers in one of the columns of the data frame ..."
> >>and I'm not sure I understand how an 'id' column would create many
> >>subsets of the data. So the simple examples above may not answer
> >>the question you are asking.
> >>
> >>- tom blackwell - u michigan medical school - ann arbor -
> >>
> >>On Tue, 17 Feb 2004, Sebastian Luque wrote:
> >>
> >>
> >>
> >>>Hi,
> >>>
> >>>In fact, I've been trying to get rid of loops in my code for more
> >>>than a week now, but nothing I try seems to work. It sounds as if
> >>>you have lots of experience with loops, so would appreciate any
> >>>pointers you may have on the following.
> >>>
> >>>I want to create a column showing the difference between the ith
> >>>row and i-1. Of course, the first row won't have any value in it,
> >>>because there is nothing above it to subtract to. This is fairly
> >>>easy to do with a simple loop, but I need to do this for all the
> >>>subsets of data created by the numbers in one of the columns of
> >>>the data frame (say, an id column). I would greatly appreciate
> >>>any idea you may have on this.
> >>>
> >>>Thanks in advance.
> >>>
> >>>Best regards,
> >>>Sebastian
> >>>--
> >>> Sebastian Luque
> >>>
> >>>sluque at mun.ca
> >>>
> >>>
> >>>
> >>>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide!
> >>http://www.R-project.org/posting-guide.html
> >>
> >>
> >>
> >>
> >
> >
> >-------------------------------------------------------------
> -----------------
> >Notice: This e-mail message, together with any attachments, contains
> >information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station, New
> >Jersey, USA 08889), and/or its affiliates (which may be
> known outside the
> >United States as Merck Frosst, Merck Sharp & Dohme or MSD
> and in Japan, as
> >Banyu) that may be confidential, proprietary copyrighted
> and/or legally
> >privileged. It is intended solely for the use of the
> individual or entity
> >named on this message. If you are not the intended
> recipient, and have
> >received this message in error, please notify us immediately
> by reply e-mail
> >and then delete it from your system.
> >-------------------------------------------------------------
> -----------------
> >
> >
> >
>
> --
> Sebastian Luque
>
> sluque at mun.ca
> Tel.: +1 (204) 586-8170
>
>
>
>
>
>
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments,...{{dropped}}
More information about the R-help
mailing list