[BioC] PCA or concordance
Sean Davis
seandavi at gmail.com
Wed Mar 3 18:17:40 CET 2010
On Wed, Mar 3, 2010 at 11:41 AM, Johnny H <ukfriend22 at googlemail.com> wrote:
> Dear Bioconductors,
> I have some proteomics data for several tissues:
>
> Heart x 3 replicates
> Lung x 3 replicates
>
> Each data set has a gene symbol and the number of peptides for that gene (a
> rough measure of protein expression).
>
> I want to make a data structure like:
>
> heart1 heart2 heart3 lung1 lung2 lung3
> Gene1 2 4 3 7 9 20
> Gene2 50 45 33 0 1 0
> Gene3 ...... etc
> Gene4
>
> Each number in the data frame corresponds to number of peptides for that
> gene.
>
> My questions are:
>
> Is a Principle Component Analysis useful for this data set?
> What would a PCA tell me?
> What function would I use make a nice graphical representation of the data?
>
> Or should I used a concordance function, something like?
>
> con<-function(y1,y2){
> d<-(mean(y1) - mean(y2))
> v1<-var(y1)
> v2<-var(y2)
> cov<-cov(y1,y2)
> con<-(2*cov)/(v1+v2+d^2)
> return(con)};
>
> This will tell me if two samples have concordance but I don't know how to
> involve all samples. Basically, I want to summarise the data.
Start simple. There are likely biases (that depend on the
experimental design and assays used) in the data. Try to determine
what those are using simple plots of the data. What are the
distributions of the data when cut various ways (per gene, per
sample)? What do scatter plots of one sample versus another look
like? Do the data need transformation (log, for example)? Do the
data need normalization (likely)?
In short, some data exploration might be necessary before you can move
on to ask more biologically relevant questions. You may already have
the information that you need to determine the best way forward, but
that isn't clear from your post.
Hope that helps,
Sean
More information about the Bioconductor
mailing list