[R] how to merge 5 data frames by one column

Tue Dec 3 21:27:22 CET 2019

I apologize I would need to reformulate this problem because there will be
much more unique genes I have to look up, 381

so all genes or in one data frame

> head(r)
               V1         V2          V3        V4
1 ENSG00000273172  rs7215271 4.33932e-17 -0.602316
2 ENSG00000273172 rs34889101 4.99518e-17 -0.596089
3 ENSG00000273172  rs4890177 4.23229e-17 -0.590085
4 ENSG00000273172  rs4890178 7.14216e-17 -0.581467
5 ENSG00000273172  rs7503363 3.16802e-17 -0.582836
6 ENSG00000273172 rs35611892 2.24399e-17 -0.583710

> tail(r)
                   V1          V2          V3        V4
18946 ENSG00000141560    rs7215271 8.53890e-17  0.572286
18947 ENSG00000141560    rs606532 9.00740e-17  0.572151
18963 ENSG00000175711 rs111566282 5.71871e-17 -0.609586
18964 ENSG00000175711  rs76319775 4.58843e-17 -0.610164
18965 ENSG00000175711  rs62074661 4.17490e-17 -0.603199
18966 ENSG00000176845  rs11433639 1.45496e-17 -0.761955

So for the adobe example I would just have in result for merging this one
row: because they gave this same rs: rs7215271
and output would contain all columns related to those two genes which have
the same:  rs7215271

it can be also possible that I can find more than 2 genes sharing the same
rs.

Can you please advise about this

On Tue, Dec 3, 2019 at 2:16 PM Ana Marija <sokovic.anamarija using gmail.com>
wrote:

> would this make sense for the previous:
> mt=na.omit(m, cols = c("V1.1","V1.2","V1.3","V1.4","V1.5"))
>
> On Tue, Dec 3, 2019 at 2:09 PM Ana Marija <sokovic.anamarija using gmail.com>
> wrote:
>
>> I can perhaps do this:
>>
>> m=Reduce(function(x, y) merge(x, y, all=TRUE), list(s11, s22,
>> s33,s44,s55))
>>
>> but than in the output of this one SNP (just for example)
>>
>> > head(m)
>>          rs            V1.1        V3.1     V4.1 V1.2 V3.2 V4.2
>>  V1.3
>> 6 rs1029829 ENSG00000154803 1.02519e-11 0.469402 <NA>   NA   NA
>> ENSG00000141030
>>          V3.3     V4.3 V1.4 V3.4 V4.4 V1.5 V3.5 V4.5
>> 6 3.06126e-28 0.726948 <NA>   NA   NA <NA>   NA   NA
>> ...
>>
>> but how to filter out this output (m) in order to remove all rows where I
>> have NA in any of these columns: V1.1,V1.2,V1.3,V1.4,V1.5
>>
>>
>>
>>
>>
>> On Tue, Dec 3, 2019 at 1:48 PM Ana Marija <sokovic.anamarija using gmail.com>
>> wrote:
>>
>>> the desired output would look like this (example give just for two
>>> genes, it should include all 5 from all 5 data frames):
>>>
>>> where the example is if say only 5 rs are shared between those two
>>> genes, what is given after rs# is values from V4 column for each gene
>>>
>>> GENES ENSG00000001629 ENSG00000127914
>>> rs1208998 -0.0337989326337439  -0.00106024397995199
>>> rs4729008 0.0630831868839983  0.00890783698397027
>>> rs11772754 0.181375539335959  0.0012636115921931
>>> rs10257459 0.0369962603988132  0.00509887844657462
>>> rs17164876 0.0307882763321834  -0.00188979524322732
>>>
>>> On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <sokovic.anamarija using gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>>>>
>>>> > head(s11)
>>>>                V1.1                          rs         V3.1        V4.1
>>>> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
>>>> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
>>>> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
>>>> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
>>>> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
>>>> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>>>>
>>>> > head(s22)
>>>>                V1.2                               rs        V3.2
>>>>  V4.2
>>>> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
>>>> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
>>>> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
>>>> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
>>>> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
>>>> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>>>>
>>>> Each one has one unique value in respective V1
>>>>
>>>> I am trying to merge all at once all 5 data frames by the "rs" column.
>>>>
>>>> Can you please help with this,
>>>> Ana
>>>>
>>>>
>>>>
>>>>
>>>>

	[[alternative HTML version deleted]]