[BioC] Venn Diagram
Hervé Pagès
hpages at fhcrc.org
Thu Jul 2 22:25:27 CEST 2009
Nice page Thomas, really a must see!
Thanks,
H.
Thomas Girke wrote:
> To get an impression how "pretty and confusingly complex" venn diagrams
> with more than 5 sets would look like, one can take a look at this page
> from combinatorics.org:
> http://www.combinatorics.org/Surveys/ds5/VennSymmEJC.html.
>
> Also, here is a small collection of methods/ideas for analyzing intersect
> relationships among large numbers of sample sets:
> http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/R_BioCondManual.html#R_graphics_overlapper
> These approaches are much more scalable than venn comparisons, but lack
> their logical 'not in' relations. The function for computing 'All
> Possible Intersects' is utility wise the closest alternative to venn
> diagrams.
>
> Thomas
>
>
> On Wed, Jul 01, 2009 at 09:41:25PM -0700, hpages at fhcrc.org wrote:
>> Oops, this is wrong, sorry! See a modified version of
>> makeVennTable() below that hopefully does the right thing.
>>
>> Quoting Hervé Pagès <hpages at fhcrc.org>:
>>
>>> Hi Simon,
>>>
>>> Simon Noël wrote:
>>>> Hello every one.
>>>>
>>>> I have ten list of between 4 to 3000 genes and I woudlike to put them all
>>>> together in a venn diagram.
>>>>
>>>> I have try to load the library ABarray and to use doVennDiagram but
>>>> it can only
>>>> une 3 list.
>>>>
>>>> Does any one know a way to put all of my ten list in the same venn
>>>> diagram?
>>> A venn diagramm is a 2-D drawing of all the possible intersections
>>> between 2 or 3 sets where each set is represented by a simple 2-D
>>> shape (typically a circle). In the case of 3 sets, the resulting
>>> diagram defines a partitioning of the 2-D plane in 8 regions.
>>> Some people have tried (with more or less success) to put 4 sets on
>>> the diagram but then they need to use more complicated shapes and
>>> the resulting diagram is not as easy to read anymore. With 10 sets,
>>> you would end up with 1024 (2^10) regions in your drawing and you
>>> would need to use extremely complicated shapes for each region
>>> making it really hard to read! Maybe in that case it's easier
>>> to generate the table below.
>>>
>>> ## Let's say your genes are in 'set1', 'set2', etc... Put all the
>>> ## sets in a big list:
>>>
>>> mysets <- list(set1, set2, ..., set10)
>>>
>>> makeVennTable <- function(sets)
>>> {
>>> mkAllLogicalVect <- function(length)
>>> {
>>> if (length == 0L)
>>> return(logical(0))
>>> ans0 <- mkAllLogicalVect(length - 1L)
>>> ans1 <- cbind(TRUE, ans0)
>>> ans2 <- cbind(FALSE, ans0)
>>> rbind(ans1, ans2)
>>> }
>>> lm <- mkAllLogicalVect(length(sets))
>>> subsets <- apply(lm, MARGIN=1,
>>> function(ii)
>>> {
>>> s <- sets[ii]
>>> if (length(s) == 0)
>>> return("")
>>> paste(sort(unique(unlist(s))), collapse=",")
>>> })
>>> data.frame(lm, subsets)
>>> }
>>>
>>> Then call makeVennTable() on 'mysets'. For example, with 5 small sets:
>>>
>>> > mysets <- list(c(1,5,12,4,9,29),
>>> c(4,11,3,18),
>>> c(22,4,12,19,8),
>>> c(7,12,4,5,3),
>>> c(25,24,4,2))
>>>
>>> > makeVennTable(mysets)
>>> X1 X2 X3 X4 X5 subsets
>>> 1 TRUE TRUE TRUE TRUE TRUE 1,2,3,4,5,7,8,9,11,12,18,19,22,24,25,29
>>> 2 TRUE TRUE TRUE TRUE FALSE 1,3,4,5,7,8,9,11,12,18,19,22,29
>>> 3 TRUE TRUE TRUE FALSE TRUE 1,2,3,4,5,8,9,11,12,18,19,22,24,25,29
>>> 4 TRUE TRUE TRUE FALSE FALSE 1,3,4,5,8,9,11,12,18,19,22,29
>>> 5 TRUE TRUE FALSE TRUE TRUE 1,2,3,4,5,7,9,11,12,18,24,25,29
>>> 6 TRUE TRUE FALSE TRUE FALSE 1,3,4,5,7,9,11,12,18,29
>>> 7 TRUE TRUE FALSE FALSE TRUE 1,2,3,4,5,9,11,12,18,24,25,29
>>> 8 TRUE TRUE FALSE FALSE FALSE 1,3,4,5,9,11,12,18,29
>>> 9 TRUE FALSE TRUE TRUE TRUE 1,2,3,4,5,7,8,9,12,19,22,24,25,29
>>> 10 TRUE FALSE TRUE TRUE FALSE 1,3,4,5,7,8,9,12,19,22,29
>>> 11 TRUE FALSE TRUE FALSE TRUE 1,2,4,5,8,9,12,19,22,24,25,29
>>> 12 TRUE FALSE TRUE FALSE FALSE 1,4,5,8,9,12,19,22,29
>>> 13 TRUE FALSE FALSE TRUE TRUE 1,2,3,4,5,7,9,12,24,25,29
>>> 14 TRUE FALSE FALSE TRUE FALSE 1,3,4,5,7,9,12,29
>>> 15 TRUE FALSE FALSE FALSE TRUE 1,2,4,5,9,12,24,25,29
>>> 16 TRUE FALSE FALSE FALSE FALSE 1,4,5,9,12,29
>>> 17 FALSE TRUE TRUE TRUE TRUE 2,3,4,5,7,8,11,12,18,19,22,24,25
>>> 18 FALSE TRUE TRUE TRUE FALSE 3,4,5,7,8,11,12,18,19,22
>>> 19 FALSE TRUE TRUE FALSE TRUE 2,3,4,8,11,12,18,19,22,24,25
>>> 20 FALSE TRUE TRUE FALSE FALSE 3,4,8,11,12,18,19,22
>>> 21 FALSE TRUE FALSE TRUE TRUE 2,3,4,5,7,11,12,18,24,25
>>> 22 FALSE TRUE FALSE TRUE FALSE 3,4,5,7,11,12,18
>>> 23 FALSE TRUE FALSE FALSE TRUE 2,3,4,11,18,24,25
>>> 24 FALSE TRUE FALSE FALSE FALSE 3,4,11,18
>>> 25 FALSE FALSE TRUE TRUE TRUE 2,3,4,5,7,8,12,19,22,24,25
>>> 26 FALSE FALSE TRUE TRUE FALSE 3,4,5,7,8,12,19,22
>>> 27 FALSE FALSE TRUE FALSE TRUE 2,4,8,12,19,22,24,25
>>> 28 FALSE FALSE TRUE FALSE FALSE 4,8,12,19,22
>>> 29 FALSE FALSE FALSE TRUE TRUE 2,3,4,5,7,12,24,25
>>> 30 FALSE FALSE FALSE TRUE FALSE 3,4,5,7,12
>>> 31 FALSE FALSE FALSE FALSE TRUE 2,4,24,25
>>> 32 FALSE FALSE FALSE FALSE FALSE
>> The above table is clearly not the expected thing because the subsets
>> in the last column are not a partition of the initial set of genes
>> (some ids appear in several rows).
>> Try this instead:
>>
>> makeVennTable <- function(sets)
>> {
>> mkAllLogicalVect <- function(length)
>> {
>> if (length == 0L)
>> return(logical(0))
>> ans0 <- mkAllLogicalVect(length - 1L)
>> ans1 <- cbind(TRUE, ans0)
>> ans2 <- cbind(FALSE, ans0)
>> rbind(ans1, ans2)
>> }
>> minter.int <- function(...)
>> {
>> args <- list(...)
>> if (length(args) == 0)
>> return(integer(0))
>> if (length(args) == 1)
>> return(args[[1]])
>> intersect(args[[1]], do.call(minter.int, args[-1]))
>> }
>> munion.int <- function(...)
>> {
>> unique(unlist(list(...)))
>> }
>> lm <- mkAllLogicalVect(length(sets))
>> parts <- apply(lm, MARGIN=1,
>> function(ii)
>> {
>> s1 <- do.call(minter.int, sets[ii])
>> s2 <- do.call(munion.int, sets[!ii])
>> part <- setdiff(s1, s2)
>> if (length(part) == 0)
>> return("")
>> paste(sort(part), collapse=",")
>> })
>> data.frame(lm, parts)
>> }
>>
>> Then:
>>
>>> makeVennTable(mysets)
>> X1 X2 X3 X4 X5 parts
>> 1 TRUE TRUE TRUE TRUE TRUE 4
>> 2 TRUE TRUE TRUE TRUE FALSE
>> 3 TRUE TRUE TRUE FALSE TRUE
>> 4 TRUE TRUE TRUE FALSE FALSE
>> 5 TRUE TRUE FALSE TRUE TRUE
>> 6 TRUE TRUE FALSE TRUE FALSE
>> 7 TRUE TRUE FALSE FALSE TRUE
>> 8 TRUE TRUE FALSE FALSE FALSE
>> 9 TRUE FALSE TRUE TRUE TRUE
>> 10 TRUE FALSE TRUE TRUE FALSE 12
>> 11 TRUE FALSE TRUE FALSE TRUE
>> 12 TRUE FALSE TRUE FALSE FALSE
>> 13 TRUE FALSE FALSE TRUE TRUE
>> 14 TRUE FALSE FALSE TRUE FALSE 5
>> 15 TRUE FALSE FALSE FALSE TRUE
>> 16 TRUE FALSE FALSE FALSE FALSE 1,9,29
>> 17 FALSE TRUE TRUE TRUE TRUE
>> 18 FALSE TRUE TRUE TRUE FALSE
>> 19 FALSE TRUE TRUE FALSE TRUE
>> 20 FALSE TRUE TRUE FALSE FALSE
>> 21 FALSE TRUE FALSE TRUE TRUE
>> 22 FALSE TRUE FALSE TRUE FALSE 3
>> 23 FALSE TRUE FALSE FALSE TRUE
>> 24 FALSE TRUE FALSE FALSE FALSE 11,18
>> 25 FALSE FALSE TRUE TRUE TRUE
>> 26 FALSE FALSE TRUE TRUE FALSE
>> 27 FALSE FALSE TRUE FALSE TRUE
>> 28 FALSE FALSE TRUE FALSE FALSE 8,19,22
>> 29 FALSE FALSE FALSE TRUE TRUE
>> 30 FALSE FALSE FALSE TRUE FALSE 7
>> 31 FALSE FALSE FALSE FALSE TRUE 2,24,25
>> 32 FALSE FALSE FALSE FALSE FALSE
>>
>> H.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list