[BioC] SPIA package problem
January Weiner
january.weiner at mpiib-berlin.mpg.de
Wed May 26 16:49:26 CEST 2010
Hello,
I am trying to use SPIA on some mouse results from a mgug4122a Agilent
microarray.
Summary of the problem is as follows:
I have two vectors: DE_gr_iii and ALL_gr_iii (created following the
SPIA vignette, see below).
> class( DE_gr_iii )
[1] "numeric"
> class( ALL_gr_iii )
[1] "character"
> names( DE_gr_iii ) <- ALL_gr_iii
> DE_gr_iii[1:10]
12808 78369 71897 241568 102075 27273
0.15805260 0.75696349 -0.02208268 -0.53025986 -0.09489560 0.16656121
20321 57435 18010 18010
-0.13020754 -0.19411325 -0.02297658 -0.03317089
> ALL_gr_iii[1:10]
[1] "12808" "78369" "71897" "241568" "102075" "27273" "20321" "57435"
[9] "18010" "18010"
> length( DE_gr_iii )
[1] 1918
Now when I run spia, I get the following error:
> res <- spia( de=DE_gr_iii, all=ALL_gr_iii, organism="mmu", nB = 2000, plots=F, beta=NULL )
Error in spia(de = DE_gr_iii, all = ALL_gr_iii, organism = "mmu", nB = 2000, :
de must be a vector of log2 fold changes. The names of de should be
included in the refference array!
The DE_gr_iii is definitely the log2 fold change vector. I'm not sure
what is meant by the reference array since I don't see it in the SPIA
vignette, but I assume that the reference is either the data file
mmuSPIA.RData or the ALL vector.
I am not sure whether ALL should really contain all Entrez IDs from
the microarray, but I think not; I have tried also with all Entrez
IDs, and it did not work; I also used the Colorectal cancer data set
from the SPIA package only with first 100 values for the DE_Colorectal
and ALL_Colorectal vectors, and it run w/o problems.
The log fold changes were taken from a microarray experiment. I don't
think there is a problem with that because I tried also to fake the
values by taking them from the Colorectal cancer data provided with
SPIA.
I don't think that there is a problem with the length of the data. I
tried also another data set with 20,000 genes, and the error was the
same. Furthermore, I tried to run the Colorectal data set using only
first 100 values, and there were no problems running that.
The SPIA package seems to be correctly installed, because I can run
the example from the vignette without any problems.
The Entrez IDs that I used were derived from the Agilent annotation
package for this chip:
> a2sel$EID <- unlist( mget( as.character( a2sel$SCode ), mgug4122aENTREZID ) )
(a2sel is a data frame containing the fold changes, gene information
etc.; agilent identifiers are stored in the SCode column)
I removed any identifiers that were not mapped to Entrez:
> length( which( is.na( a2sel$EID ) ) )
[1] 7022
> a2sel <- a2sel[ !is.na( a2sel$EID ),]
> length( which( is.na( a2sel$EID ) ) )
[1] 0
The last hypothesis was that for whatever reason there is a problem
with Entrez IDs (that they do not match the IDs from the mmuSPIA.RData
file provided by the distribution). I tested this by using the
identifiers that are directly to be found in the mmuSPIA.RData pathway
info.
I loaded the pathway info from the mmuSPIA.RData file:
> load( file=paste( system.file( "extdata/mmuSPIA.RData", package="SPIA" ) ) )
I have chosen a pathway that contains several interactions of the type
"activation" and used the colnames and rownames of the matrix for my
ALL vector:
> all_ttt <- c( colnames( path.info[["04010"]]$activation ), rownames( path.info[["04010"]]$activation ) )
> length( all_ttt )
[1] 564
I generated some random fold changes:
> de_ttt <- runif( length( all_ttt ), -10, 10 )
> names( de_ttt ) <- all_ttt
The result was, again, error:
> res <- spia( de=de_ttt, all=all_ttt, organism="mmu", nB = 2000, plots=F, beta=NULL )
Error in spia(de = de_ttt, all = all_ttt, organism = "mmu", nB = 2000,
plots = F, :
de must be a vector of log2 fold changes. The names of de should be
included in the refference array!
I have no idea what the problem is.
Thanks in advance for any help -- maybe I should use another package?
I have lost two days on this problem already.
j.
P.S.
> sessionInfo()
R version 2.10.1 (2009-12-14)
i486-pc-linux-gnu
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mgug4122a.db_2.3.6 SPIA_1.4.0 org.Mm.eg.db_2.3.6
[4] BioIDMapper_2.1 gWidgetsRGtk2_0.0-65 gWidgets_0.0-41
[7] lattice_0.18-3 XML_3.1-0 RCurl_1.4-2
[10] bitops_1.0-4.1 hgu95av2.db_2.3.5 org.Hs.eg.db_2.3.6
[13] GO.db_2.3.5 annotate_1.24.1 GOstats_2.12.0
[16] RSQLite_0.8-3 DBI_0.2-5 graph_1.26.0
[19] Category_2.12.1 AnnotationDbi_1.8.2 Biobase_2.6.1
loaded via a namespace (and not attached):
[1] genefilter_1.24.3 grid_2.10.1 GSEABase_1.8.0 RBGL_1.24.0
[5] RGtk2_2.12.15 splines_2.10.1 survival_2.35-8 tcltk_2.10.1
[9] tools_2.10.1 xtable_1.5-6
--
-------- Dr. January Weiner 3 --------------------------------------
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin, Germany
Web : www.mpiib-berlin.mpg.de
Tel : +49-30-28460514
More information about the Bioconductor
mailing list