[BioC] Ways to verify identity samples

Tue Oct 4 15:56:47 CEST 2005

¨¹@yzu wrote:
> Hi all:
> I have 11 Human Genome U133 plus 2.0 samples.
> There are 2 of them are identity samples.
> I try to use correlation of samples to verify them.
> But even not identity samples have high correlations.
> 
> here is my test code
> #read all 11 samples expression value table , (using MAS5)
> sample <- read.table("samples.txt",sep="\t")
> corelation <- cor(sample,sample,method="pearson")
> 
> Could any one have any other ways to verify identity samples ?

I don't think correlation is a good measure here, primarily since you
have so many observations per sample. Since noise is such a large
component of the signal for many of the genes in a given sample, I find
that correlation is higher than one might expect for samples that are
unrelated, and lower than expected for samples that should be very similar.

You might be better off using PCA and plotting the first two (or three)
principal components. I find this to be a better 'eyeballometric'
measure of similarity.

See ?prcomp. Also note that you have to transpose your data because
prcomp expects samples in rows and observations in columns, unlike the
usual microarray paradigm.

Best,

Jim

> 
> Sincerely,
> Chang Bang
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623