[BioC] Venn Diagram

Thomas Girke thomas.girke at ucr.edu
Thu Jul 2 16:22:49 CEST 2009


To get an impression how "pretty and confusingly complex" venn diagrams
with more than 5 sets would look like, one can take a look at this page
from combinatorics.org:
http://www.combinatorics.org/Surveys/ds5/VennSymmEJC.html.

Also, here is a small collection of methods/ideas for analyzing intersect
relationships among large numbers of sample sets:
http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/R_BioCondManual.html#R_graphics_overlapper
These approaches are much more scalable than venn comparisons, but lack
their logical 'not in' relations. The function for computing 'All
Possible Intersects' is utility wise the closest alternative to venn
diagrams.

Thomas


On Wed, Jul 01, 2009 at 09:41:25PM -0700, hpages at fhcrc.org wrote:
> Oops, this is wrong, sorry! See a modified version of
> makeVennTable() below that hopefully does the right thing.
> 
> Quoting Hervé Pagès <hpages at fhcrc.org>:
> 
> >Hi Simon,
> >
> >Simon Noël wrote:
> >>Hello every one.
> >>
> >>I have ten list of between 4 to 3000 genes and I woudlike to put them all
> >>together in a venn diagram.
> >>
> >>I have try to load the library ABarray and to use doVennDiagram but  
> >> it can only
> >>une 3 list.
> >>
> >>Does any one know a way to put all of my ten list in the same venn 
> >>diagram?
> >
> >A venn diagramm is a 2-D drawing of all the possible intersections
> >between 2 or 3 sets where each set is represented by a simple 2-D
> >shape (typically a circle). In the case of 3 sets, the resulting
> >diagram defines a partitioning of the 2-D plane in 8 regions.
> >Some people have tried (with more or less success) to put 4 sets on
> >the diagram but then they need to use more complicated shapes and
> >the resulting diagram is not as easy to read anymore. With 10 sets,
> >you would end up with 1024 (2^10) regions in your drawing and you
> >would need to use extremely complicated shapes for each region
> >making it really hard to read! Maybe in that case it's easier
> >to generate the table below.
> >
> >## Let's say your genes are in 'set1', 'set2', etc... Put all the
> >## sets in a big list:
> >
> >mysets <- list(set1, set2, ..., set10)
> >
> >makeVennTable <- function(sets)
> >{
> >  mkAllLogicalVect <- function(length)
> >  {
> >    if (length == 0L)
> >      return(logical(0))
> >    ans0 <- mkAllLogicalVect(length - 1L)
> >    ans1 <- cbind(TRUE, ans0)
> >    ans2 <- cbind(FALSE, ans0)
> >    rbind(ans1, ans2)
> >  }
> >  lm <- mkAllLogicalVect(length(sets))
> >  subsets <- apply(lm, MARGIN=1,
> >               function(ii)
> >               {
> >                 s <- sets[ii]
> >                 if (length(s) == 0)
> >                   return("")
> >                 paste(sort(unique(unlist(s))), collapse=",")
> >               })
> >  data.frame(lm, subsets)
> >}
> >
> >Then call makeVennTable() on 'mysets'. For example, with 5 small sets:
> >
> >  > mysets <- list(c(1,5,12,4,9,29),
> >                  c(4,11,3,18),
> >                  c(22,4,12,19,8),
> >                  c(7,12,4,5,3),
> >                  c(25,24,4,2))
> >
> >  > makeVennTable(mysets)
> >        X1    X2    X3    X4    X5                                 subsets
> >  1   TRUE  TRUE  TRUE  TRUE  TRUE 1,2,3,4,5,7,8,9,11,12,18,19,22,24,25,29
> >  2   TRUE  TRUE  TRUE  TRUE FALSE         1,3,4,5,7,8,9,11,12,18,19,22,29
> >  3   TRUE  TRUE  TRUE FALSE  TRUE   1,2,3,4,5,8,9,11,12,18,19,22,24,25,29
> >  4   TRUE  TRUE  TRUE FALSE FALSE           1,3,4,5,8,9,11,12,18,19,22,29
> >  5   TRUE  TRUE FALSE  TRUE  TRUE         1,2,3,4,5,7,9,11,12,18,24,25,29
> >  6   TRUE  TRUE FALSE  TRUE FALSE                 1,3,4,5,7,9,11,12,18,29
> >  7   TRUE  TRUE FALSE FALSE  TRUE           1,2,3,4,5,9,11,12,18,24,25,29
> >  8   TRUE  TRUE FALSE FALSE FALSE                   1,3,4,5,9,11,12,18,29
> >  9   TRUE FALSE  TRUE  TRUE  TRUE       1,2,3,4,5,7,8,9,12,19,22,24,25,29
> >  10  TRUE FALSE  TRUE  TRUE FALSE               1,3,4,5,7,8,9,12,19,22,29
> >  11  TRUE FALSE  TRUE FALSE  TRUE           1,2,4,5,8,9,12,19,22,24,25,29
> >  12  TRUE FALSE  TRUE FALSE FALSE                   1,4,5,8,9,12,19,22,29
> >  13  TRUE FALSE FALSE  TRUE  TRUE               1,2,3,4,5,7,9,12,24,25,29
> >  14  TRUE FALSE FALSE  TRUE FALSE                       1,3,4,5,7,9,12,29
> >  15  TRUE FALSE FALSE FALSE  TRUE                   1,2,4,5,9,12,24,25,29
> >  16  TRUE FALSE FALSE FALSE FALSE                           1,4,5,9,12,29
> >  17 FALSE  TRUE  TRUE  TRUE  TRUE        2,3,4,5,7,8,11,12,18,19,22,24,25
> >  18 FALSE  TRUE  TRUE  TRUE FALSE                3,4,5,7,8,11,12,18,19,22
> >  19 FALSE  TRUE  TRUE FALSE  TRUE            2,3,4,8,11,12,18,19,22,24,25
> >  20 FALSE  TRUE  TRUE FALSE FALSE                    3,4,8,11,12,18,19,22
> >  21 FALSE  TRUE FALSE  TRUE  TRUE                2,3,4,5,7,11,12,18,24,25
> >  22 FALSE  TRUE FALSE  TRUE FALSE                        3,4,5,7,11,12,18
> >  23 FALSE  TRUE FALSE FALSE  TRUE                       2,3,4,11,18,24,25
> >  24 FALSE  TRUE FALSE FALSE FALSE                               3,4,11,18
> >  25 FALSE FALSE  TRUE  TRUE  TRUE              2,3,4,5,7,8,12,19,22,24,25
> >  26 FALSE FALSE  TRUE  TRUE FALSE                      3,4,5,7,8,12,19,22
> >  27 FALSE FALSE  TRUE FALSE  TRUE                    2,4,8,12,19,22,24,25
> >  28 FALSE FALSE  TRUE FALSE FALSE                            4,8,12,19,22
> >  29 FALSE FALSE FALSE  TRUE  TRUE                      2,3,4,5,7,12,24,25
> >  30 FALSE FALSE FALSE  TRUE FALSE                              3,4,5,7,12
> >  31 FALSE FALSE FALSE FALSE  TRUE                               2,4,24,25
> >  32 FALSE FALSE FALSE FALSE FALSE
> 
> The above table is clearly not the expected thing because the subsets
> in the last column are not a partition of the initial set of genes
> (some ids appear in several rows).
> Try this instead:
> 
> makeVennTable <- function(sets)
> {
>    mkAllLogicalVect <- function(length)
>    {
>      if (length == 0L)
>        return(logical(0))
>      ans0 <- mkAllLogicalVect(length - 1L)
>      ans1 <- cbind(TRUE, ans0)
>      ans2 <- cbind(FALSE, ans0)
>      rbind(ans1, ans2)
>    }
>    minter.int <- function(...)
>    {
>      args <- list(...)
>      if (length(args) == 0)
>        return(integer(0))
>      if (length(args) == 1)
>        return(args[[1]])
>      intersect(args[[1]], do.call(minter.int, args[-1]))
>    }
>    munion.int <- function(...)
>    {
>      unique(unlist(list(...)))
>    }
>    lm <- mkAllLogicalVect(length(sets))
>    parts <- apply(lm, MARGIN=1,
>                 function(ii)
>                 {
>                   s1 <- do.call(minter.int, sets[ii])
>                   s2 <- do.call(munion.int, sets[!ii])
>                   part <- setdiff(s1, s2)
>                   if (length(part) == 0)
>                     return("")
>                   paste(sort(part), collapse=",")
>                 })
>    data.frame(lm, parts)
> }
> 
> Then:
> 
> >makeVennTable(mysets)
>       X1    X2    X3    X4    X5   parts
> 1   TRUE  TRUE  TRUE  TRUE  TRUE       4
> 2   TRUE  TRUE  TRUE  TRUE FALSE
> 3   TRUE  TRUE  TRUE FALSE  TRUE
> 4   TRUE  TRUE  TRUE FALSE FALSE
> 5   TRUE  TRUE FALSE  TRUE  TRUE
> 6   TRUE  TRUE FALSE  TRUE FALSE
> 7   TRUE  TRUE FALSE FALSE  TRUE
> 8   TRUE  TRUE FALSE FALSE FALSE
> 9   TRUE FALSE  TRUE  TRUE  TRUE
> 10  TRUE FALSE  TRUE  TRUE FALSE      12
> 11  TRUE FALSE  TRUE FALSE  TRUE
> 12  TRUE FALSE  TRUE FALSE FALSE
> 13  TRUE FALSE FALSE  TRUE  TRUE
> 14  TRUE FALSE FALSE  TRUE FALSE       5
> 15  TRUE FALSE FALSE FALSE  TRUE
> 16  TRUE FALSE FALSE FALSE FALSE  1,9,29
> 17 FALSE  TRUE  TRUE  TRUE  TRUE
> 18 FALSE  TRUE  TRUE  TRUE FALSE
> 19 FALSE  TRUE  TRUE FALSE  TRUE
> 20 FALSE  TRUE  TRUE FALSE FALSE
> 21 FALSE  TRUE FALSE  TRUE  TRUE
> 22 FALSE  TRUE FALSE  TRUE FALSE       3
> 23 FALSE  TRUE FALSE FALSE  TRUE
> 24 FALSE  TRUE FALSE FALSE FALSE   11,18
> 25 FALSE FALSE  TRUE  TRUE  TRUE
> 26 FALSE FALSE  TRUE  TRUE FALSE
> 27 FALSE FALSE  TRUE FALSE  TRUE
> 28 FALSE FALSE  TRUE FALSE FALSE 8,19,22
> 29 FALSE FALSE FALSE  TRUE  TRUE
> 30 FALSE FALSE FALSE  TRUE FALSE       7
> 31 FALSE FALSE FALSE FALSE  TRUE 2,24,25
> 32 FALSE FALSE FALSE FALSE FALSE
> 
> H.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list