[R] Find "undirected" duplicates in a tibble

Gabor Grothendieck ggrothend|eck @end|ng |rom gm@||@com
Fri Aug 20 18:18:42 CEST 2021


Since you are dealing with graphs you could consider using
the igraph package.  This is more involved than needed for
what you are
asking but it might be useful for other follow on calculations.
We first define a 2 column matrix of edges, then convert it to
an igraph and simplify it to remove duplicate edges giving g.
At the end we get an edgelist back.

  library(igraph)
  m <- matrix(c(1, 2, 6, 6, 4, 9, 1, 5, 2, 1, 8, 7, 5, 10, 6, 10), 8, 2)
  g <- m |>
    graph_from_edgelist(directed = FALSE) |>
    simplify()

  plot(g)

  g |>
    get.edgelist() |>
    as.data.frame()



On Fri, Aug 20, 2021 at 5:00 AM Kimmo Elo <kimmo.elo using utu.fi> wrote:
>
> Hi!
>
> I am working with a large network data consisting of source-target
> pairs stored in a tibble. Now I need to transform the directed dataset
> to an undirected network data. This means, I need to keep only one
> instance for pairs with the same "nodes". In other words, if my data
> has one row with A (source) and B (target) and one with B (source) and
> A (target), only the pair A-B should be kept.
>
> Here an example how I have solved this problem so far:
>
> --- snip ---
>
> # Create some data
> x<-tibble(Source=rep(1:3,4), Target=c(rep(1,3),rep(2,3),rep(3,3),rep(4,3)))
> x       # print original data
>
> # Remove "undirected" duplicates
> x<-x %>% mutate(pair=mapply(function(x,y)
> paste0(sort(c(x,y)),collapse="-"), Source, Target)) %>% distinct(pair,
> .keep_all = T) %>% mutate(Source=sapply(pair, function(x)
> unlist(strsplit(x, split="-"))[1]), Target=sapply(pair, function(x)
> unlist(strsplit(x, split="-"))[2])) %>% select(-pair)
>
> x       # print cleaned data
>
> --- snip ---
>
> The good thing with my own solution is that it allows the creation of
> weighted pairs as well. One just needs to replace 'distinct(pair,
> .keep_all=T)' with 'count(pair)'.
>
> I have done a lot of searching but not found any function providing
> this functionality. Does someone know an alternative, maybe a more
> effective function/solution?
>
> Best,
>
> Kimmo Elo
>
>
> --
> Dr. Kimmo Elo
> Senior researcher in European Studies
> =====================================================
> University of Turku
> Centre for Parliamentary Studies
> Finland
> E-mail: kimmo.elo using utu.fi
> =====================================================
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list