[BioC] Advice for analyzing Affy data

Tue Sep 9 10:01:41 MEST 2003

First off, I would recommend using the rma expression values rather than
the MAS expression values. It is not surprising that you are still
seeing 'AFFX' data because the MAS expression values are very noisy,
especially at the low end.

Secondly, you might try an additional filter. Right now you are
filtering for genes where at least four are larger than 150, but this
doesn't filter out those genes that don't really change much between
sample types. Since these genes are (by definition) less interesting, it
is better to get rid of them before doing any clustering.

I usually exclude genes where the CV is less than some ad hoc value. I
base the cutoff on the number of genes I end up filtering (real
scientific, I know...). I base the number of genes I want to remain
based on what I am doing with the clustering result.

If you are only interested in seeing how the samples cluster, the
number of genes used is not that critical, except for the time/compute
power required. However, if you are going to be making a heat map or
some other pretty picture, then you really need to limit the number of
genes because heat maps become too large to be useful at about 150 genes
or so.

HTH,

Jim

James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

>>> "Tan, MinHan" <MinHan.Tan at vai.org> 09/08/03 11:13PM >>>
Good evening,

I'm new to R and Affymetrix data analysis. I'd truly appreciate it if
someone could give me some pointers as to how to proceed. I really am
not sure what I'm doing wrong.

(a) performed the ReadAffy() steps, and created expression sets of my
data in both MAS5 (esetmas) and RMA format. (the magnitude difference
is
quite startling)

(b) Used genefilter to perform some simple filtering. 
f1<-kOverA(4,150)
ffun<-filterfun(f1)
whichmas<-genefilter(exprs(esetmas),ffun)
exprData <- exprs(esetmas)
filterData <- exprData[whichmas,]

(c) I'm not sure how to perform the ideal form of unsupervised
clustering and how best to view those results as plots.

hc<-hclust(dist(filterData),"ave")
Plot(hc)

All I see is some very skewed looking data, with lots of the AFFX
genes
still present. I've tried running the GeneSOM function, but I don't
quite understand the output.

Thank you!!

Best regards,

Min-Han Tan, MD, MRCP(UK)
Laboratory of Cancer Genetics
Van Andel Research Institute
333 Bostwick NE
Grand Rapids MI 49503
Tel: (616) 234-5350
Fax: (616) 234-5115
This email message, including any attachments, is for the sole use of
the intended recipient(s) and may contain confidential information.  Any
unauthorized review, use, disclosure or distribution is prohibited.  If
you are not the intended recipient(s) please contact the sender by reply
email and destroy all copies of the original message.  Thank you.

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor