[BioC] RE: venn diagram

Arne.Muller at aventis.com Arne.Muller at aventis.com
Wed Apr 28 15:14:33 CEST 2004


Hi,

the only way I can think of is to generate paires of random sets of the same size as the real set pairs and run the vector comparison (as below), do this 10,000 times or so. Then estimate the parameters of the distribution (maybe it's even normal distributed).

I'd sample directly from the entire population of gene on the chip.

	regards,

	Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Patrick
> Cahan
> Sent: 28 April 2004 15:02
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] RE: venn diagram
> 
> 
> Any ideas on how to calculate the significance or rather the 
> probability of getting a given similarity score by chance?  
> 
> /pc
> 
> Patrick Cahan
> 202.994.8922
> pcahan1 at gwu.edu
> 
> > You can then create a distance matrix from this by calculating all 
> > pariwise combinations of the length normalized cosine between the 
> > vectors:
> > > a <- c(1,1,0,0,1,0)
> > > b <- c(0,1,1,0,1,1)
> > > x <- a%*%b / (length(a) * length(b))
> > > x
> >           [,1]
> > [1,] 0.05555556
> > 
> > x is a measure for the similarity between vectors a and b. This is 
> > used is a standard procedure in text/document comparison. Since 
> > one want s to create a distance matrix one still needs to somehow 
> > "invert" this matrix so that high similqrity gets small values!
> > 
> > Once you've your matrix M of cosines (this is a symmetric matrix 
> > m). You convert this via as.dist(M), and pass it to the hclust 
> > routine.
> > I'd be interested in the outcome (does it make sense?) - if you're 
> > interested. You should only try it if you've got *many* sets to 
> > test, so that a real Venn approach gets too complex.
> > 
> > 	good luck and let me know how it goes,
> > 	+regards,
> > 
> > 	Arne
> > 
> > --
> > Arne Muller, Ph.D.
> > Toxicogenomics, Aventis Pharma
> > arne dot muller domain=aventis com
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list