[BioC] similarity between two gene lists with varied length

Shannon, William WSHANNON at dom.wustl.edu
Sun Aug 24 03:15:50 CEST 2008

First thought is a similarity can be based on the ratio of the number of genes in the intersection of the two lists divided by the number of genes in the union of the two lists.  If the two lists are identical the similarity is 1 and if they have no genes in common they have a similarity of 0.  Of course this won't take into account the length of the gene lists.

You would have to think through what would happen to the similarity for cases where some genes are in both lists.

Bill Shannon
Associate Professor of Biostatistics in Medicine
Washington University School of Medicine

President-Elect, Classification Society

From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Weiwei Shi [helprhelp at gmail.com]
Sent: Saturday, August 23, 2008 7:55 PM
To: r-help at stat.math.ethz.ch
Cc: Bioconductor
Subject: [BioC] similarity between two gene lists with varied length

Dear listers,

a little off-topic:

I am looking for and compare algorithms which can calculate "distance" or
"similarity" between two gene lists with different lengths.

Any paper, any implementation in R and any suggestion is welcome!


Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

        [[alternative HTML version deleted]]

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list