[BioC] Question regarding handling technical replicates for Affy arrays

noel0925 at sbcglobal.net noel0925 at sbcglobal.net
Sun May 28 02:56:51 CEST 2006


Hi All,

There seems to have been much discussion regarding 
the statistical issues surrounding handling of
technical replicates in gene expression data using the
Limma package as pertains to two-color arrays, but
less so for Affy data.

As such, I am still uncertain regarding how best to
handle a single technical replicate in a dataset 40
Affy arrays from 3 tumor types. 

Clearly, I want to model the biological variation
between these three tumor types as a fixed effect.

Limma, I understand, is only able to handle 1 random
effect. Unlike two-color arrays, single color arrays
do not necessitate allocating random effects to within
array spot replicates or dye swaps so, I should still
be able to model the lack of dependence between these
two technical reps as a random effect using the
duplicateCorrelation function.

It is unclear to me how to specify this single
technical replicate however. The examples I have read
about in the limma manual or other desciptions of its
functions usually deal with cDNA arrays and usually
there is a nice structure to the technical
replication- eg pairs of technical replicates for each
sample (dye swap or otherwise). In my case, I only
have  one technical rep (and 20, 10, and 10 biolgical
reps).  

What seemed to make sense to me was:

>classes<- c(rep(1,20), rep(2,10), rep(3,10))

>f<- factor(targets$Target, levels = c("RNA1", "RNA2",
"RNA3"))
>design<- model.matrix(~0+f)
>colnames(design)<- c("RNA1", "RNA2", "RNA3")
>biolrep <- c(1,2,3,4,5,5,6,7,8,9

40)
>corfit<- duplicateCorrelation(eset, design, ndups=1,
block= biolrep)
>fit<-lmFit(eset,design, ndups=1, block=biolrep
,cor=corfit$consensus)
>contrast.matrix<- makeContrasts(RNA1-RNA3, 
RNA2-RNA3,RNA1-RAN2, levels=design)
>fit2<- contrasts.fit(fit, contrast.matrix)
>fit2<-eBayes(fit2)

How else can I specify biolrep?



Obviously it would be preferable to model the lack of
dependence rather than go with the alternatives.

i.) Averaging the two technical reps so as to treat
them as primary data. 
I assume this would be done post-normalization and
summarization?


ii.) I have also read that simply treating the
technical rep as a biological rep is not too dangerous
since- the measurement error can be larger than the
biological variation- but am hesitant since I want to
perform downstream analysis like clustering and don't
want an 'extra' sample that really comes from the same
patient.

iii.) Discard the data from one of the technical reps
(worse still).

I would greatly appreciate any insight into the best
way to handle this issue.

Thanks in advance.
Noelle



More information about the Bioconductor mailing list