[BioC] package xps: export.filter issues
cstrato
cstrato at aon.at
Thu Aug 30 20:28:39 CEST 2012
Dear Steven,
It seems, that adding fName does indeed cause problems, probably due to
the fact that the Affymetrix gene annotation does contain characters
which interfere with R data.frame. However, adding fSymbol only is fine.
Here is my example:
> tmp <- validData(rma.ufr, which="UnitName")
> dim(tmp)
[1] 54675 9
> tmp <- export.filter(rma.ufr, treename = "*", treetype = "stt",
varlist = "fUnitName:fc:pval:flag", as.dataframe = TRUE, verbose = TRUE)
Warning: tree <UniTest.ufr> does not exist or has not <54675> entries.>
> dim(tmp)
[1] 54675 5
> tmp <- export.filter(rma.ufr, treename = "*", treetype = "stt",
varlist = "fUnitName:fSymbol:fc:pval:flag", as.dataframe = TRUE, verbose
= TRUE)
Warning: tree <UniTest.ufr> does not exist or has not <54675>
entries.Reading entries from <HG-U133_Plus_2.ann> ...Finished
> dim(tmp)
[1] 54675 6
> tmp <- export.filter(rma.ufr, treename = "*", treetype = "stt",
varlist = "fUnitName:fName:fSymbol:fc:pval:flag", as.dataframe = TRUE,
verbose = TRUE)
Warning: tree <UniTest.ufr> does not exist or has not <54675>
entries.Reading entries from <HG-U133_Plus_2.ann> ...Finished
dim(tmp)
#[1] 28298 7
As you see, everything is fine as long as fName is not included. (You
can ignore the warning)
As you have already seen, when you simply export the results, you will
get the complete output:
> export.filter(rma.ufr, treename = "*", treetype = "stt", varlist =
"fUnitName:fName:fSymbol:fc:pval:flag", outfile = "UniFltr.txt", sep =
"\t", as.dataframe = FALSE, verbose = TRUE)
If you try to import table "UniFltr.txt" as data.frame you get the same
problem:
> tmp <- read.table("UniFltr.txt", header=TRUE, row.names=NULL,
sep="\t", check.names=FALSE, stringsAsFactors=FALSE, comment.char="")
> dim(tmp)
[1] 28298 7
You see that even setting read.table(...,comment.char="",...) does not
help. Thus the only option is to use fSymbol only but not fName.
Since you only need to get the gene symbols there is no need to use
hgu133plus2.db, as my example above has shown.
I hope this does help.
Best regards,
Christian
On 8/17/12 3:05 PM, steven wink wrote:
> Dear Christian,
>
> Thank you for your reply. I noticed the problem occurs when loading the
> data in R memory as a dataframe. Writing to a file directly: no problem.
> Loading from file to R memory it gets messed up again. The dataframe
> without the gene annotation is fine.
> I wanted the full set in a data.frame to select a set of genes with
> foldchanges etc. For now I used the bioc package hgu133plus2.db instead
> of using xps: export.filter to attach gene symbols.
>
> I saw one warning:
> Warning: tree <UniTest.ufr> does not exist or has not <54675>
> entries.Reading entries from <HG-U133_Plus_2.ann> ...Finished
>
> I also followed the code on your email. Same effect.
>
> root version
> Version 5.32/04 13 July 2012
>
> xps 1.16.0
> =
> netaffx-annotation-date=2011-06-22
> =
> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=nl_NL.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=nl_NL.UTF-8 LC_COLLATE=nl_NL.UTF-8
> [5] LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=nl_NL.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rjson_0.2.9 hgu133plus2.db_2.7.1 org.Hs.eg.db_2.7.1
> [4] RSQLite_0.11.1 DBI_0.2-5 AnnotationDbi_1.18.1
> [7] Biobase_2.16.0 BiocGenerics_0.2.0 xps_1.16.0
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.14.4 probemapper_1.0.0 stats4_2.15.1 tools_2.15.1
> =
>
> importing the previously created scheme file:
>
> scmdir<-"/media/Working Drive/Promotie/mArray biomarker
study/Schemes/na32"
> scheme<-root.scheme(file.path(scmdir,"hgu133plus2.root"))
> class(scheme)
> [1] "SchemeTreeSet"
> attr(,"package")
> [1] "xps"
>
> library(xps)
> celdir<-file.path("/media/Working Drive/Promotie/mArray biomarker
> study/Lisa Paper/carbamazepine.Human.in_vitro.Liver/24 low")
>
celfiles<-c("carbamazepine_ctrl_24h_1.CEL","carbamazepine_ctrl_24h_2.CEL","carbamazepine_l_24h_1.CEL","carbamazepine_l_24h_2.CEL")
>
data.carb<-import.data(scheme,"tmpdt_dataCarb",celdir=celdir,celfiles=celfiles,verbose=FALSE)
> data.rma<-rma(data.carb,"tmpdt_rma",verbose=FALSE)
> png("pca_carb_24h_low_PHH.png",width=800,height=800)
> >
>
pcaplot(data.rma,group=c("ctrl","ctrl","l24h","l24h"),add.labels=TRUE,add.legend=TRUE)
> > dev.off()
> null device
>
> unifltr<-UniFilter(foldchange=c(1.2,"both"),unifilter=c(0.1,"pval"))
> uniTest(unifltr) <- c("t.test","two.sided","BH",0,0.0,FALSE,0.95,TRUE)
> > str(unifltr)
> Formal class 'UniFilter' [package "xps"] with 5 slots
> ..@ foldchange:List of 2
> .. ..$ cutoff : num 1.2
> .. ..$ direction: chr "both"
> ..@ prescall : list()
> ..@ unifilter :List of 2
> .. ..$ cutoff : num 0.1
> .. ..$ variable: chr "pval"
> ..@ unitest :List of 8
> .. ..$ type : chr "t.test"
> .. ..$ alternative: chr "two.sided"
> .. ..$ correction : chr "BH"
> .. ..$ numperm : int 0
> .. ..$ mu : num 0
> .. ..$ paired : logi FALSE
> .. ..$ conflevel : num 0.95
> .. ..$ varequ : logi TRUE
> ..@ numfilters: num 2
>
>
> rma.ufr<-unifilter(data.rma,"tmpdt_rmaufr5",getwd(),unifltr,
> group=c("ctrl","ctrl","l24h","l24h"),verbose=FALSE)
>
>
tmp<-export.filter(rma.ufr,treetype="stt",varlist="fUnitName:fName:fSymbol:fc:pval:flag",
> + as.dataframe=TRUE,verbose=FALSE)
>
>
> tmp[3670,]
> UNIT_ID UnitName GeneName GeneSymbol P-Value FoldChange Flag
> 3670 5641 206054_at kininogen 1 KNG1 0.369156 1.05887 0
>
>
> Bellow I include a small portion of the output from row 3671. (In my
> text file some blocks/ multiple rows of character/strings are quoted,
> then blocks contain strings which are not.)
>
> tmp[3671, ]
> 72\t0\n8438\t208882_s_at\tubiquitin protein ligase E3 component
> n-recognin 5\tUBR5\t0.874644\t1.00482\t0\n8439\t208883_at\tubiquitin
> protein ligase E3 component n-recognin
> 5\tUBR5\t0.770298\t0.967776\t0\n8440\t208884_s_at\tubiquitin protein
> ligase E3 component n-recognin
> 5\tUBR5\t0.253671\t1.04481\t0\n8441\t208885_at\tlymphocyte cytosolic
> protein 1 (L-plastin)\tLCP1\t0.977804\t1.00235\t0\n8442\t208886_at\tH1
> histone family, member
> 0\tH1F0\t0.500673\t0.955601\t0\n8443\t208887_at\teukaryotic translation
> initiation factor 3, subunit
> G\tEIF3G\t0.0389222\t1.05615\t0\n8444\t208888_s_at\tnuclear receptor
> corepressor 2\tNCOR2\t0.372557\t1.22396\t0\n8445\t208889_s_at\tnuclear
> receptor corepressor
> 2\tNCOR2\t0.272628\t1.0801\t0\n8446\t208890_s_at\tplexin
> B2\tPLXNB2\t0.0275446\t1.0633\t0\n8447\t208891_at\tdual specificity
> phosphatase 6\tDUSP6\t0.837928\t0.981868\t0\n8448\t208892_s_at\tdual
> specificity phosphatase
> 6\tDUSP6\t0.491362\t0.932852\t0\n8449\t208893_s_at\tdual specificity
> phosphatase 6\tDUSP6\t0.359966\t0.964169\t0\n8450\t208894_at\tmajor
> histocompatibility complex, class II, DR
> alpha\tHLA-DRA\t0.9963\t0.998907\t0\n8451\t208895_s_at\tDEAD
> (Asp-Glu-Ala-Asp) box polypeptide
> 18\tDDX18\t0.265118\t1.03538\t0\n8452\t208896_at\tDEAD (Asp-Glu-Ala-Asp)
> box polypeptide
> 18\tDDX18\t0.124475\t0.955801\t0\n8453\t208897_s_at\tDEAD
> (Asp-Glu-Ala-Asp) box polypeptide
> 18\tDDX18\t0.11651\t1.05331\t0\n8454\t208898_at\tATPase, H+
> transporting, lysosomal 34kDa, V1 subunit
> D\tATP6V1D\t0.118903\t0.970489\t0\n8455\t208899_x_at\tATPase, H+
> transporting, lysosomal 34kDa, V1 subunit
> D\tATP6V1D\t0.738698\t0.995617\t0\n8456\t208900_s_at\ttopoisomerase
> (DNA) I\tTOP1\t0.660803\t0.965493\t0\n8457\t208901_s_at\ttopoisomerase
> (DNA) I\tTOP1\t0.140795\t0.963365\t0\n8458\t208902_s_at\tribosomal
> protein S28\tRPS28\t0.939403\t1.00859\t0\n8459\t208903_at\tribosomal
> protein S28\tRPS28\t0.720425\t0.953657\t0\n8460\t208904_s_at\tribosomal
> protein S28\tRPS28\t0.118968\t0.958277\t0\n8461\t208905_at\tcytochrome
> c,
> somatic\tCYCS\t0.801304\t0.993872\t0\n8462\t208906_at\tBerardinelli-Seip
> congenital lipodystrophy 2
> (seipin)\tBSCL2\t0.156978\t1.05388\t0\n8463\t208907_s_at\tmitochondrial
> ribosomal protein
>
S18B\tMRPS18B\t0.113534\t1.02395\t0\n8464\t208908_s_at\tcalpastatin\tCAST\t0.067195\t1.03521\t0\n8465\t208909_at\tubiquinol-cytochrome
> c reductase, Rieske iron-sulfur polypeptide
> 1\tUQCRFS1\t0.570856\t1.0093\t0\n8466\t208910_s_at\tcomplement component
> 1, q subcomponent binding
> protein\tC1QBP\t0.0812231\t0.958867\t0\n8467\t208911_s_at\tpyruvate
> dehydrogenase (lipoamide)
> beta\tPDHB\t0.351079\t1.01438\t0\n8468\t208912_s_at\t2,3-cyclic
> nucleotide 3 phosphodiesterase
> GeneSymbol P-Value FoldChange Flag
> 3671 CNP 0.886081 0.991868 0
>
>
>
>
>
>
> ================================
> following Christian's mail
> ================================
> datdir<-"/media/Working Drive/Promotie/mArray biomarker study/Lisa
> Paper/selection of genes/carb data"
> >
>
data.carb<-import.data(scheme,"dataCarbamazepine",filedir=datdir,celdir=celdir,
> celfiles=celfiles)
> >
>
data.rma<-rma(data.carb,"dataCarbamazepineRMA",tmpdir="",background="pmonly",normalize
> = TRUE)
> expr.rma<-validData(data.rma)
> export.expr(data.rma, treename="*", treetype="mdp",varlist ="*",
> outfile="24hoursLow_carb_PHH.txt", sep="\t", as.dataframe=FALSE, verbose
> = TRUE)
> unifltr <- UniFilter(unitest=c("t.test", "two.sided", "BH", 0, 0.0,
> FALSE, 0.95, TRUE))
> rma.ufr<-unifilter(data.rma,"dataCarbUnifilter",getwd(),unifltr,
> group=c("ctrl","ctrl","l24h","l24h"))
> export.filter(rma.ufr, treename="*",treetype="stt", varlist =
> "fUnitName:fName:fSymbol:fc:pval:flag", outfile = "UniFltr.txt", sep =
> "\t", as.dataframe = FALSE, verbose = TRUE)
> =verbose=
> Opening file </media/Working Drive/Promotie/mArray biomarker
> study/Schemes/na32/hgu133plus2.root> in <READ> mode...
> Opening file </media/Working Drive/Promotie/mArray biomarker study/Lisa
> Paper/selection of genes/dataCarbUnifilter_ufr.root> in <READ> mode...
> Opening file </media/Working Drive/Promotie/mArray biomarker study/Lisa
> Paper/selection of genes/dataCarbUnifilter_ufr.root> in <READ> mode...
> Exporting data from tree <*> to file <UniFltr.txt>...
> Warning: tree <UniTest.ufr> does not exist or has not <54675>
> entries.Reading entries from <HG-U133_Plus_2.ann> ...Finished
> NULL
> >
> ==
>
> tmp<-validData(rma.ufr, which="UnitName")
> tmp <- export.filter(rma.ufr, treename = "*", treetype = "stt",
> varlist = "fUnitName:fName:fSymbol:fc:pval:flag", as.dataframe = TRUE,
> verbose = TRUE)
>
> Having done this the text file "UniFltr.txt" looks fine. However:
> test<-read.table("UniFltr.txt",sep="\t",header=TRUE)
> > test[3671,2]
> [1] 206055_s_at
> 28298 Levels: 1007_s_at 1053_at 117_at 1553075_a_at 1553995_a_at ...
> AFFX-TrpnX-M_at
More information about the Bioconductor
mailing list