[BioC] PCA or concordance

Wed Mar 3 18:17:40 CET 2010

On Wed, Mar 3, 2010 at 11:41 AM, Johnny H <ukfriend22 at googlemail.com> wrote:
> Dear Bioconductors,
> I have some proteomics data for several tissues:
>
> Heart x 3 replicates
> Lung x 3 replicates
>
> Each data set has a gene symbol and the number of peptides for that gene (a
> rough measure of protein expression).
>
> I want to make a data structure like:
>
>            heart1   heart2  heart3  lung1   lung2    lung3
> Gene1  2            4           3        7         9           20
> Gene2    50        45          33      0         1            0
> Gene3  ...... etc
> Gene4
>
> Each number in the data frame corresponds to number of peptides for that
> gene.
>
> My questions are:
>
> Is a Principle Component Analysis useful for this data set?
> What would a PCA  tell me?
> What function would I use make a nice graphical representation of the data?
>
> Or should I used a concordance function, something like?
>
> con<-function(y1,y2){
>  d<-(mean(y1) - mean(y2))
>  v1<-var(y1)
>  v2<-var(y2)
>  cov<-cov(y1,y2)
>  con<-(2*cov)/(v1+v2+d^2)
>  return(con)};
>
> This will tell me if two samples have concordance but I don't know how to
> involve all samples. Basically, I want to summarise the data.

Start simple.  There are likely biases (that depend on the
experimental design and assays used) in the data.  Try to determine
what those are using simple plots of the data.  What are the
distributions of the data when cut various ways (per gene, per
sample)?  What do scatter plots of one sample versus another look
like?  Do the data need transformation (log, for example)?  Do the
data need normalization (likely)?

In short, some data exploration might be necessary before you can move
on to ask more biologically relevant questions.  You may already have
the information that you need to determine the best way forward, but
that isn't clear from your post.

Hope that helps,
Sean