[BioC] Agi4x44PreProcess /filtering probenames from GeneName

Sat Mar 19 10:46:11 CET 2011

Dear Maria

I am not sure I understood your question, anyway: would perhaps the
'strsplit' function of R help you, that allows you to split strings and 
then extract components?

E.g. the idiom

       sapply(strsplit(x, ","), "[", 2)

will extract the text between the first and second comma in each string 
within x.

	Best wishes
	Wolfgang

Il Mar/18/11 2:28 PM, Maria Raeder ha scritto:
> Dear Mailing List,
>
> I have been struggling for some time with some agilent single channel
> arrays, which I believe has been scanned with a earlier version AFE,
> because they do not contain the columns Sequence and chr coord, but I
> have tried to use the Agi4x44PreProcess package, with some
> adjustments, please see below. My main problem now is that I cannot
> remove the agilent probe names which are embedded within the
> genesymbol column for some genes The reason for doing this is to
> prepare  files for GSEA analysis. The function for doing this in the
> Agi4x44PreProcess  package: gsea.files, does not work, porbably due
> the the columns I am lacking, and the filter.probes also returns an
> error message, probably due to the same reason.
>
> I would be very grateful for any comments and help
>
> Thanks, Maria
>
> Here is the code :
>
> library("Agi4x44PreProcess") library("hgug4112a.db") library("vsn")
> library("convert") library("GO.db")
>
> setwd("/mydirectory")
>
> #reading targets file targets=read.targets(infile="targets_ec3.txt")
> targets[1:10,1:5]
>
> names(targets)
>
> #Many( has skipped them, but included FIleName, Treatment and GErep)
>
> #read in files with LIMMA: dd<- read.maimages(targets$FileName,
> source="agilent", columns = list(G = "gMedianSignal", Gb = "gBGUsed",
> R = "gProcessedSignal", Rb = "gBGMedianSignal"), annotation =
> c("Row", "Col","FeatureNum", "ControlType","ProbeName","ProbeUID",
> "GeneName", "SystematicName", "Description", "gIsWellAboveBG",
> "gIsFound", "gIsSaturated", "gIsFeatPopnOL", "gIsFeatNonUnifOL"))
>
> #reads inn 146 arrays)
>
> ##########Quality control (skipped)
>
> ###########Background correction and normailzation and log 2
> transformation: library(vsn) ddNORM = BGandNorm(dd, BGmethod =
> "half", NORMmethod = "quantile",foreground = "MeanSignal", background
> = "BGMedianSignal", offset = 50, makePLOTpre = FALSE, makePLOTpost =
> FALSE)
>
> #filtering: ddFILT=filter.probes(ddNORM, control=TRUE,
> wellaboveBG=TRUE, isfound=TRUE, wellaboveNEG=TRUE, sat=TRUE,
> PopnOL=TRUE, NonUnifOL=TRUE, nas=TRUE, limWellAbove=75, limISF=75,
> limNEG=75, limSAT=75, limPopnOL=75, limNonUnifOL=75, limNAS=100,
> makePLOT=TRUE,annotation.package="hgug4112a.db",flag.counts=FALSE,targets)
>
>  FILTERING PROBES BY FLAGS
>
>
> FILTERING BY ControlType FLAG Error in data.frame(PROBE_ID,
> as.character(probe.chr), as.character(probe.seq),  : arguments imply
> differing number of rows: 43376, 0
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber