[R] Add a column to a dataframe based on multiple other column values
Keith S Weintraub
kw1958 at gmail.com
Wed Jun 12 13:12:18 CEST 2013
Tom,
Here is my solution. Note that I assume the columns are interleaved as you describe below. I'm sure others will have better replies.
Note that using dput helps the helpers.
# From dput(mdat)
mdat<-structure(list(x1 = c(2L, 2L, 2L, 3L, 3L, 30L, 32L, 33L, 33L),
y1 = c(100L, 100L, 100L, 0L, 0L, 0L, 100L, 82L, 0L), x2 = c(190L,
192L, 192L, 195L, 198L, 198L, 868L, 870L, 871L), y2 = c(99L,
63L, 63L, 99L, 98L, 100L, 100L, 100L, 82L), x3 = c(1430L,
1431L, 1444L, 1499L, 1500L, 1451L, 1451L, 1490L, 1494L),
y3 = c(79L, 75L, 51L, 50L, 80L, 97L, 97L, 97L, 85L), output = c(89,
69, 57, 74.5, 89, 65.66666667, 99, 93, 55.66666667)), .Names = c("x1",
"y1", "x2", "y2", "x3", "y3", "output"), class = "data.frame", row.names = c(NA,
-9L))
mdat.pure<-mdat[,-ncol(mdat)]
# Function to apply to rows
theFunk<-function(x) {
nxy<-length(x)/2
idx<-seq_len(nxy)
xvec<-x[idx*2 - 1]
yvec<-x[idx*2]
mean(yvec[xvec>10])
}
# Apply the function to rows
output<-apply(mdat.pure, 1, theFunk)
Or
mdat.pure$output<-apply(mdat.pure, 1, theFunk)
will put the calculated column at the end of mdat.pure.
Note that I haven't taken account of missing values.
Hope this helps,
KW
--
On Jun 12, 2013, at 6:00 AM, r-help-request at r-project.org wrote:
> Message: 35
> Date: Tue, 11 Jun 2013 17:07:12 +0100
> From: Tom Oates <toates19 at gmail.com>
> To: r-help at r-project.org
> Subject: [R] Add a column to a dataframe based on multiple other
> column values
> Message-ID:
> <CAGUdn1CxLfxXNzDWQUo515h_h5qeKFMUyG5MsDb1qn6gBQ7cVg at mail.gmail.com>
> Content-Type: text/plain
>
> Hi
> I have a dataframe as below:
>
> x1 y1 x2 y2 x3 y3 output
> 2 100 190 99 1430 79 89
> 2 100 192 63 1431 75 69
> 2 100 192 63 1444 51 57
> 3 0 195 99 1499 50 74.5
> 3 0 198 98 1500 80 89
> 30 0 198 100 1451 97 65.66666667
> 32 100 868 100 1451 97 99
> 33 82 870 100 1490 97 93
> 33 0 871 82 1494 85 55.66666667
>
>
> In reality the dataframe has pairs of columns x & y up to a large number.
> As you can see from the column labelled output in the dataframe; I want to
> calculate the mean of each row of the yn columns, but only to include each
> yn value in the calculation of the mean if the corresponding xn column
> value is greater than 10.
> So for row 1; you will see that only y2 & y3 are included in calculating
> the output column, but for row 6 y1-y3 are all included.
> Because the number of paired x & y columns is large I am not sure the best
> way to achieve this.
> Thanks in advance
> Tom
More information about the R-help
mailing list