[R] create list of names where two df contain == values
Rob Griffin
robgriffin247 at hotmail.com
Wed Nov 16 16:35:29 CET 2011
Ok, thanks for looking in to this so far, I seem to have confused you all a
little though so I think I need to make this a bit clearer:
in the real situation:
df.1 is 271*13891, and contains (amongst others) columns with Flybase.CG,
rMF, and Affyid values.
df.2 is 14*12572 and is made from subset of df.1 which removed rows with
duplicated Flybase.CG values, and df.2 also includes the rMF column
because df.2 is made from the non-duplicated values it is shorter.
I now need to put the Affyid column from df.1 in to df.2 -
My idea is:
to match a value on each row that is unique to that row (within column) but
shared on both datasets - rMF contains such numbers
then get R to copy the corresponding Affyid value (an alphanumeric id) from
df.1 and place it in df.2$Affy (or at least in to a list which I could then
put in to a column) with all "shared" rMF values and ignore all others
for example df.1 and df.2 both contain the rMF value 0.3393211 which
corresponds to the same data point which in df.1 has this Affyid: 1638273_at
if you imagine the two rMF columns lined up next to each other they start
the same and run in the same order, but df.2's has had "random" points
removed as was the aim of making df.2, so as soon as you get to that point
the rest of the list doesn't line up.
What R needs to do is go down the df.2 rMF list one by one, and for each
df.2 rMF check the entire df.1 rMF list for a match, then take the
corresponding Affyid.
for example df.1 and df.2 both contain the rMF value 0.3393211
which corresponds to the same sample point which in df.1 has this
Affyid: 1638273_at but they occur on different rows in the data frame.
is that a bit clearer? I know this is pretty complex.
David, your idea with ifelse worked for the first few lines then as soon as
it got to a point where one of the Flybase.CG values had been removed during
the process of making df.2 it got out of line between the data frames and
just gave NA after there.
Rob
-----Original Message-----
From: Dennis Murphy
Sent: Wednesday, November 16, 2011 4:03 PM
To: Rob Griffin
Cc: r-help at r-project.org
Subject: Re: [R] create list of names where two df contain == values
Hi:
I think you're overthinking this problem. As is usually the case in R,
a vectorized solution is clearer and provides more easily understood
code.
It's not obvious to me exactly what you want, so we'll try a couple of
variations on the same idea. Equality of floating point numbers is a
difficult computational problem (see R FAQ 7.31), but if it makes
sense to define a threshold difference between floating numbers that
practically equates to zero, then you're in business. In your example,
the difference in numb1 for letter h in the two data frames is far
from zero, so define 'equal' to be a difference < 10 ^{-6}. Then:
# Return the entire matching data frame
df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, ]
Letters numb1 extra.col id
1 a 0.3735462 1 CG234
2 b 1.1836433 2 CG232
3 c 0.1643714 3 CG441
4 d 2.5952808 4 CG128
5 e 1.3295078 5 CG125
6 f 0.1795316 6 CG182
7 g 1.4874291 7 CG982
9 i 1.5757814 9 CG282
10 j 0.6946116 10 CG154
# Return the matching letters only as a vector:
df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, 'Letters' ]
If you want the latter object to remain a data frame, use drop = FALSE
as an extra argument after 'Letters'. If you want to create a list
object such that each letter comprises a different list component,
then the following will do - the as.character() part coerces the
factor Letters into a character object:
as.list(as.character(df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001,
'Letters' ]))
HTH,
Dennis
On Wed, Nov 16, 2011 at 5:03 AM, Rob Griffin <robgriffin247 at hotmail.com>
wrote:
> Hello again... sorry to be posting yet again, but I hadn't anticipated
> this
> problem.
>
> I am trying to now put the names found in one column in data frame 1 (lets
> call it df.1[,1]) in to a list from the rows where the values in df.1[,2]
> match values in a column of another dataframe (df.2[3])
> I tried to write this function so that it put the list of names (called
> Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched but I think
> its
> too complex for a beginner R-enthusiast
>
> ify<-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else {NULL}
> Iffy<-apply( df.1, 1, FUN=ify, x=df.1, y=df.2, a=2, b=3, c=1 )
>
> But this didn't work... Error in FUN(newX[, i], ...) : unused argument(s)
> (newX[, i])
>
>
> Here is a dataset that replicates the problem, you'll notice the "h"
> criteria values are different between the two dataframes and therefore it
> would produce a list of the 9 letters where the two criteria columns
> matched (a,b,c,d,e,f,g,i,j):
>
>
>
> df.1<-data.frame(rep(letters[1:10]))
> colnames(df.1)[1]<-("Letters")
> set.seed(1)
> df.1$numb1<-rnorm(10,1,1)
> df.1$extra.col<-c(1,2,3,4,5,6,7,8,9,10)
> df.1$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154")
> df.1
>
> df.2<-data.frame(rep(letters[1:10]))
> colnames(df.2)[1]<-("Letters")
> set.seed(1)
> df.2$extra.col<-c(1,2,3,4,5,6,7,8,9,10)
> df.2$numb1<-rnorm(10,1,1)
> df.2$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154")
> df.2[8,3]<-12
>
> df.1
> df.2
>
>
>
>
> Your patience is much appreciated,
> Rob
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list