[BioC] Agi4x44PreProcess - filter.probes function

Mon Oct 18 23:35:10 CEST 2010

Dear Pedro and Bioc Users,

I posted this question couple of weeks ago and didn't hear from anyone. In the mean time I tried couple of different things to get the filter.probes function working. One of the things is to check if all the column names match with the ones in the filter.probes function. I didn't see anything missing. One thing I noticed is that my files (agilent feature extracted files) have PROBE_UID instead of PROBE_ID.   I tried to do change it to see it works and it still does not work. All the other functions work perfectly and I just want to filter the probes so that all the controls are deleted before doing statistical analysis. Any help will be greatly appreciated.

GSEA enrichment analysis function generates a file with extension ".gct". Does anyone know how this information can be interpreted? 

I used Feature extraction software version 9.1. and the arrays are Agilent 4x44 zebrafish arrays. 

Thank you,
Neel

R version 2.11.1 (2010-05-31)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
[R.app GUI 1.34 (5589) x86_64-apple-darwin9.8.0]

> library("Agi4x44PreProcess")
> targets=read.targets(infile="targets2.txt")

Target File 
                  FileName Treatment GErep
103-AHRR-a1 103-AHRR-a1.txt     AHRRa     1
103-AHRR-a2 103-AHRR-a2.txt     AHRRa     1
103-AHRR-b1 103-AHRR-b1.txt     AHRRb     2
103-AHRR-b2 103-AHRR-b2.txt     AHRRb     2
102-CONT-1   102-CONT-1.txt      CONT     3
102-CONT-2   102-CONT-2.txt      CONT     3

> dd2=read.AgilentFE(targets, makePLOT=FALSE)
Read 103-AHRR-a1.txt 
Read 103-AHRR-a2.txt 
Read 103-AHRR-b1.txt 
Read 103-AHRR-b2.txt 
Read 102-CONT-1.txt 
Read 102-CONT-2.txt 

 RGList: 
	dd$R:	'gProcessedSignal'  
	dd$G:	'gMeanSignal'  
	dd$Rb:	'gBGMedianSignal'  
	dd$Gb:	'gBGUsed'  

> dim(dd2)
[1] 44407     6
> names(dd2)
[1] "R"       "G"       "Rb"      "Gb"      "targets" "genes"   "other"  
> CV.rep.probes(dd2,"zf.db",foreground="MeanSignal", raw.data=TRUE,writeR=FALSE,targets)

------------------------------------------------------ 
Non-CTRL Replicated probes 
	foreground:  MeanSignal 
		FILTERING BY ControlType FLAG 
		RAW DATA: PROBES AFTER ControlType FILTERING:  42990 

------------------------------------------------------ 
	REPLICATED NonCtrl Probes 21495 
	UNIQUE probes 21495 
	DISTRIBUTION OF REPLICATED NonControl Probes 
reps
   1 
21495 
	# REPLICATED (redundant) probeNames 21495 
------------------------------------------------------ 
MEDIAN % CV 
103-AHRR-a1 103-AHRR-a2 103-AHRR-b1 103-AHRR-b2  102-CONT-1  102-CONT-2 
     2.477       1.279       1.454       2.157       1.689       1.342 
------------------------------------------------------ 
> genes.rpt.agi(dd2,"zf.db",raw.data=TRUE,WRITE.html=FALSE,REPORT=FALSE)

GENE SETS: same genes interrogated by different probes 
		FILTERING BY ControlType FLAG 
		RAW DATA: PROBES AFTER ControlType FILTERING:  42990 

	INPUT DATA: RAW 
	CHIP: zf.db 

	PROBE SETS (NON-CTRL prob rep. x 10):	 21495 
	GEN-SETS (REPLICATED GENES):		 2281 
	PROBES in gen-sets:			 5012 
> ddNORM=BGandNorm(dd2, BGmethod="half",NORMmethod="quantile",foreground="MeanSignal",background="BGMedianSignal",offset=50, makePLOTpre=FALSE, makePLOTpost=FALSE)
BACKGROUND CORRECTION AND NORMALIZATION  

	foreground: MeanSignal 
	background: BGMedianSignal 

	BGmethod:	 half 
	NORMmethod:	 quantile 
	OUTPUT in log-2 scale 

------------------------------------------------------ 
> ddFILT=filter.probes(ddNORM, control=TRUE,wellaboveBG=TRUE, isfound=TRUE,wellaboveNEG=TRUE,sat=TRUE,PopnOL=TRUE,NonUnifOL=T, nas=TRUE,limWellAbove=75,limISF=75,limNEG=75,limPopnOL=75,limNonUnifOL=75, limNAS=100,makePLOT=F,annotation.package="zf.db",flag.counts=T, targets=targets)
FILTERING PROBES BY FLAGS 

FILTERING BY ControlType FLAG 
Error in data.frame(PROBE_ID, as.character(probe.chr), as.character(probe.seq),  : 
 arguments imply differing number of rows: 42990, 0

I did the remaining analysis and they all worked well.

> summarize.probe(dd,makePLOT=TRUE,targets)
SUMMARIZATION OF non-CTRL PROBES  

SUMMARIZED DATA:  21555 6 
------------------------------------------------------ 
Hit <Return> to see next plot: 
Error in plot.new() : attempt to plot on null device
> ddPROC=summarize.probe(dd,makePLOT=TRUE, targets)
SUMMARIZATION OF non-CTRL PROBES  

SUMMARIZED DATA:  21555 6 
------------------------------------------------------ 
Hit <Return> to see next plot: 
Hit <Return> to see next plot: 
> eset.PROC=build.eset(ddPROC,targets,makePLOT=TRUE,annotation.package="zf.db")
> dim(eset.PROC)
Features  Samples 
   21555        6 
> write.eset(eset.PROC, ddPROC, "zf.db",targets)
> mappings=build.mappings(eset.PROC,annotation.package="zf.db")

	The mapping process takes a while ... 

> gsea.files(eset.PROC, targets, annotation.package="zf.db")
GSEA OUTPUT FILES  

	DataSet.gct and Phenotypes.cls  
   1 AHRRa-AHRRb 
       unique gene symbols:  8428 
       samples:              4 

   2 AHRRa-CONT 
       unique gene symbols:  8428 
       samples:              4 

   3 AHRRb-CONT 
       unique gene symbols:  8428 
       samples:              4 

Neel Aluru
Postdoctoral Scholar
Biology Department
Woods Hole Oceanographic Institution
Woods Hole, MA 02543
USA
508-289-3607