[BioC] outliers removal

James MacDonald jmacdon at med.umich.edu
Fri Jun 11 20:58:45 CEST 2004


The error you are getting is due to the fact that Dataf <-
write.table(exprmatfilt) doesn't create any R object, so when you run
expresso, you are passing a NULL object for an AffyBatch.

In addition, I am not sure that what you are trying to do is going to
work. There are at least two problems here. First, you would need to
pass exprmatfilt back into Data (exprs(Data) <- exprmatfilt), however,
if you pass a smaller matrix back into an AffyBatch it will mess things
up.

> dat <- read.affybatch(filenames=list.celfiles())
> dat2 <- dat
> exprmat <- exprs(dat)
> exprs(dat2) <- exprmat[-(100:150),]
> all.equal(pm(dat2, geneNames(dat2)[1]), pm(dat, geneNames(dat)[1]))
[1] "Mean relative  difference: 3.187727"
> geneNames(dat2)[1]
[1] "1007_s_at"
> geneNames(dat)[1]
[1] "1007_s_at"

So passing the smaller matrix back into the AffyBatch changes what data
are attributed to a given gene.

The second problem is that you are randomly subsetting the raw data,
but you are not making any changes to the cdfenv to accommodate these
changes. When you run expresso, the cdfenv is used to get all the probe
data for each gene. However, you are removing some of the expression
values for certain genes without telling the cdfenv which genes have
lost data. What will happen is that you will end up with probes from
different probesets used to compute expression values for a given gene.

So, long story short, you should not be doing any of this. Medianpolish
will not be affected by these 'outlier' data that you are removing
anyway.

HTH,

Jim



James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

>>> "Roger Vallejo" <rvallejo at psu.edu> 06/11/04 02:08PM >>>
Bioconductor Fellows,

I have 8 samples (affy arrays with 22,690 probesets). I would like to:

1.   Remove outliers and select some genes (using raw signal data from
*.CEL files), to then

2.   Run the standard data processing technique (let say just RMA)

 

Obviously I have problems on doing these. I have used these R
commands:

 

*******************Begin R commands**********************

Data <- ReadAffy()

exprmatf=exprs(Data)

dim(exprmatf)

 

# Floor & ceiling of raw data

exprmatf [exprmatf <10] <-10 

exprmatf [exprmatf >25000]<-25000

 

# Preliminary selection of genes

tmp1<-apply(exprmatf,1,max)

tmp2<-apply(exprmatf,1,min)

which1<-(1:506944)[(tmp1/tmp2)>2]

which2<-(1:506944)[(tmp1-tmp2)>100]

exprmatf.sub <-intersect(which1,which2)

exprmatfilt  <-exprmatf[exprmatf.sub,]

 

Dataf<- write.table(exprmatfilt)

library(vsn)

normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods, "vsn")

esetf <- expresso(Dataf, bg.correct=FALSE, normalize.method="vsn",
pmcorrect.method="pmonly", summary.method="medianpolish")

 

*******************End R commands**********************

 

The last command does not run! 

I got this error message:

 

normalization: vsn 

PM/MM correction : pmonly 

expression values: medianpolish 

normalizing...Error in normalize(afbatch, normalize.method) : 

        No direct or inherited method for function "normalize" for
this
call

 

I would appreciate somebody indicating why is that? Or what I am doing
wrong.

Thank you very much for the help!!

 

Roger

 

 

 

Roger L. Vallejo, Ph.D.

Assist. Professor of Genomics & Bioinformatics

Genomics & Bioinformatics Laboratory

Department of Dairy & Animal Science

The Pennsylvania State University

305 Henning Building

University Park, PA 16802

Phone:       (814) 865-1846 

Email:        rvallejo at psu.edu 

 


	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list