[BioC] In silico interactomes

Tony Chiang tchiang at fhcrc.org
Thu Jun 22 21:16:27 CEST 2006

Hi Maria,

Let me try and answer your questions here pretty concisely which will 
probably generate more questions from you...

First, we developed the ScISI package so as to get away from other 
interactomes with the format "protein A : protein B". One reason is that 
ScISI uses the hypergraph model (equivalently, the bipartite graph model) 
for protein complex membership. The relationship is protein membership in 
a protein complex, and so this relationship is not one to one as format 
for "protein A : protein B" entails. The incidence matrix represents the 
hypergraph for the protein complex interactome of the organism: the rows 
are indexed by the genes and the columns are indexed by the protein 
complexes. A one in the (i,j) position of the matrix signifies protein i 
is a memberber of protein complex j.

The matrix and spoke models are methods to use one to one (or binary) 
relationships to model protein complex co-membership (i.e are two proteins 
common to any complex at all in the interactome) loosely based on the 
affinity purification-mass spectrometry technology. The problem with these 
models is that they don't offer any insights to non-binary relationships. 

One thing to note when you analyze the data is that protein co-membership 
between 2 proteins ($p_1$ and $p_2$) does not imply that these two 
proteins will directly, physically interact; it means they are constituent 
members of some protein complex. So please do not compare protein 
co-membership binary data with protein physical interaction data. The are 
related by not the same.

Before you can convert the data to a list you want, you need to convert 
the incidence matrix (hypergraph model) to an adjacency matrix (graph 
model) which does model binary relationships. What you will need to do is 

AM <- IM %*% t(IM)

At this point, the matrix AM will be an adjacency matrix where both rows
and columns are indexed by the genes. The (i,j) entry is a nonnegative 
entry which counts the number of distinct protein complexes to which 
protein i and protein j are co-members. Therefore any non-zero entry of
AM gives you a protein co-membership relationship "protein A : protein B". 
If you don't care about the multiplicity, you can run these two lines of 

mode(AM) <- "logical"
mode(AM) <- "numeric"

This will make AM into a {0,1}-matrix where the entry 1 implies 
co-membership and 0 implies not. From here you can generate the "protein A 
: protein B" relationships fairly easily with code you have given. 

One last thing is that ScISI does not have any information about baits and 
preys. ScISI.rda estimates some true state of nature within an organism 
which will not have bait and prey relationships. Bait and prey information 
is only relevant to experimental data (actaully pretty important). If you 
want the bait to prey data for AP-MS experiments, 5 empirical data sets 
can be found in the R-packagee apComplex (TAP.rda, HMSPCI.rda, 
Krogan.rda, gavinBP2006.rda, and kroganBPMat2006.rda). 

The above is only really valid for protein complex data. If you are 
looking for physical interaction information, y2hStat has both small and 
large scale data sets from Y2H experiments:


The structure is a list of list of list. The top level is a list of 42 
experiments. Each of the 42 is a list of bait to prey interactions. Each 
sub-list of each experiment lists represents a bait (the name of the this 
sub-list is the gene name of the bait), and the contents of this list is a 
character vector of the prey found by this bait. These data set can be 
comparable to the physical interactions.


On Thu, 22 Jun 2006 maria at cbm.bio.uniroma2.it wrote:

> Dear Bioconductor mailing list,
> I am explorying the two packages ScISI and y2hStat: the aim of these 
> packages is to build an interactome starting from the available protein 
> protein interaction datasets and combining them following a certain model. 
> I would like to compare the interactome obtained with these packages with 
> interactomes produced with other approaches and with a my own approach. 
> These other putatives interactomes are generally in the format "proteinA proteinB";
> now,my question is: 
> how to go from the incidence matrix (the final output of the package 
> ScISI) to this other kind of format without having the information about who was the 
> bait and who was the prey?(I am interested in doing this with the merging 
> result, the object ScISI.rda).
> maybe a "solution" could be to transform the incidence matrix in a 
> list of lists by doing....
> for (i in 1:ncol(incidence.matrix)) {
>    complexes.list$i <- list()
>    comp <- rownames(exam[which(incidence.matrix[, i] == 1), i, drop =FALSE])
>    complexes.list[[i]]<-comp
>    }
>   and then apply the "matrix model" to every list 
> component? but in this way the result is not comparable 
> with the other interactomes generally based on the so called spoke 
> model...
> thanks for your attention,
> regards,
> maria
> Maria Persico, PhD. student
> http://cbm.bio.uniroma2.it/~maria/
> MINT database group
> Universita' di Tor Vergata, via della Ricerca scientifica 11
> 00133 Roma, Italy
> Tel +39 0672594315 (Supervisor's room)
> Fax +39 0672594766
> Mobile phone: +393479715662
> e-mail maria at cbm.bio.uniroma2.it
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list