[BioC] Agi4x44PreProcess - filter.probes function
Neel Aluru
naluru at whoi.edu
Mon Oct 18 23:35:10 CEST 2010
Dear Pedro and Bioc Users,
I posted this question couple of weeks ago and didn't hear from anyone. In the mean time I tried couple of different things to get the filter.probes function working. One of the things is to check if all the column names match with the ones in the filter.probes function. I didn't see anything missing. One thing I noticed is that my files (agilent feature extracted files) have PROBE_UID instead of PROBE_ID. I tried to do change it to see it works and it still does not work. All the other functions work perfectly and I just want to filter the probes so that all the controls are deleted before doing statistical analysis. Any help will be greatly appreciated.
GSEA enrichment analysis function generates a file with extension ".gct". Does anyone know how this information can be interpreted?
I used Feature extraction software version 9.1. and the arrays are Agilent 4x44 zebrafish arrays.
Thank you,
Neel
R version 2.11.1 (2010-05-31)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
[R.app GUI 1.34 (5589) x86_64-apple-darwin9.8.0]
> library("Agi4x44PreProcess")
> targets=read.targets(infile="targets2.txt")
Target File
FileName Treatment GErep
103-AHRR-a1 103-AHRR-a1.txt AHRRa 1
103-AHRR-a2 103-AHRR-a2.txt AHRRa 1
103-AHRR-b1 103-AHRR-b1.txt AHRRb 2
103-AHRR-b2 103-AHRR-b2.txt AHRRb 2
102-CONT-1 102-CONT-1.txt CONT 3
102-CONT-2 102-CONT-2.txt CONT 3
> dd2=read.AgilentFE(targets, makePLOT=FALSE)
Read 103-AHRR-a1.txt
Read 103-AHRR-a2.txt
Read 103-AHRR-b1.txt
Read 103-AHRR-b2.txt
Read 102-CONT-1.txt
Read 102-CONT-2.txt
RGList:
dd$R: 'gProcessedSignal'
dd$G: 'gMeanSignal'
dd$Rb: 'gBGMedianSignal'
dd$Gb: 'gBGUsed'
> dim(dd2)
[1] 44407 6
> names(dd2)
[1] "R" "G" "Rb" "Gb" "targets" "genes" "other"
> CV.rep.probes(dd2,"zf.db",foreground="MeanSignal", raw.data=TRUE,writeR=FALSE,targets)
------------------------------------------------------
Non-CTRL Replicated probes
foreground: MeanSignal
FILTERING BY ControlType FLAG
RAW DATA: PROBES AFTER ControlType FILTERING: 42990
------------------------------------------------------
REPLICATED NonCtrl Probes 21495
UNIQUE probes 21495
DISTRIBUTION OF REPLICATED NonControl Probes
reps
1
21495
# REPLICATED (redundant) probeNames 21495
------------------------------------------------------
MEDIAN % CV
103-AHRR-a1 103-AHRR-a2 103-AHRR-b1 103-AHRR-b2 102-CONT-1 102-CONT-2
2.477 1.279 1.454 2.157 1.689 1.342
------------------------------------------------------
> genes.rpt.agi(dd2,"zf.db",raw.data=TRUE,WRITE.html=FALSE,REPORT=FALSE)
GENE SETS: same genes interrogated by different probes
FILTERING BY ControlType FLAG
RAW DATA: PROBES AFTER ControlType FILTERING: 42990
INPUT DATA: RAW
CHIP: zf.db
PROBE SETS (NON-CTRL prob rep. x 10): 21495
GEN-SETS (REPLICATED GENES): 2281
PROBES in gen-sets: 5012
> ddNORM=BGandNorm(dd2, BGmethod="half",NORMmethod="quantile",foreground="MeanSignal",background="BGMedianSignal",offset=50, makePLOTpre=FALSE, makePLOTpost=FALSE)
BACKGROUND CORRECTION AND NORMALIZATION
foreground: MeanSignal
background: BGMedianSignal
BGmethod: half
NORMmethod: quantile
OUTPUT in log-2 scale
------------------------------------------------------
> ddFILT=filter.probes(ddNORM, control=TRUE,wellaboveBG=TRUE, isfound=TRUE,wellaboveNEG=TRUE,sat=TRUE,PopnOL=TRUE,NonUnifOL=T, nas=TRUE,limWellAbove=75,limISF=75,limNEG=75,limPopnOL=75,limNonUnifOL=75, limNAS=100,makePLOT=F,annotation.package="zf.db",flag.counts=T, targets=targets)
FILTERING PROBES BY FLAGS
FILTERING BY ControlType FLAG
Error in data.frame(PROBE_ID, as.character(probe.chr), as.character(probe.seq), :
arguments imply differing number of rows: 42990, 0
I did the remaining analysis and they all worked well.
> summarize.probe(dd,makePLOT=TRUE,targets)
SUMMARIZATION OF non-CTRL PROBES
SUMMARIZED DATA: 21555 6
------------------------------------------------------
Hit <Return> to see next plot:
Error in plot.new() : attempt to plot on null device
> ddPROC=summarize.probe(dd,makePLOT=TRUE, targets)
SUMMARIZATION OF non-CTRL PROBES
SUMMARIZED DATA: 21555 6
------------------------------------------------------
Hit <Return> to see next plot:
Hit <Return> to see next plot:
> eset.PROC=build.eset(ddPROC,targets,makePLOT=TRUE,annotation.package="zf.db")
> dim(eset.PROC)
Features Samples
21555 6
> write.eset(eset.PROC, ddPROC, "zf.db",targets)
> mappings=build.mappings(eset.PROC,annotation.package="zf.db")
The mapping process takes a while ...
> gsea.files(eset.PROC, targets, annotation.package="zf.db")
GSEA OUTPUT FILES
DataSet.gct and Phenotypes.cls
1 AHRRa-AHRRb
unique gene symbols: 8428
samples: 4
2 AHRRa-CONT
unique gene symbols: 8428
samples: 4
3 AHRRb-CONT
unique gene symbols: 8428
samples: 4
Neel Aluru
Postdoctoral Scholar
Biology Department
Woods Hole Oceanographic Institution
Woods Hole, MA 02543
USA
508-289-3607
More information about the Bioconductor
mailing list