[BioC] SPIA package problem

January Weiner january.weiner at mpiib-berlin.mpg.de
Wed May 26 16:49:26 CEST 2010


Hello,

I am trying to use SPIA on some mouse results from a mgug4122a Agilent
microarray.


Summary of the problem is as follows:

I have two vectors: DE_gr_iii and ALL_gr_iii (created following the
SPIA vignette, see below).

> class( DE_gr_iii )
[1] "numeric"
> class( ALL_gr_iii )
[1] "character"
> names( DE_gr_iii ) <- ALL_gr_iii
> DE_gr_iii[1:10]
      12808       78369       71897      241568      102075       27273
 0.15805260  0.75696349 -0.02208268 -0.53025986 -0.09489560  0.16656121
      20321       57435       18010       18010
-0.13020754 -0.19411325 -0.02297658 -0.03317089
> ALL_gr_iii[1:10]
 [1] "12808"  "78369"  "71897"  "241568" "102075" "27273"  "20321"  "57435"
 [9] "18010"  "18010"

> length( DE_gr_iii )
 [1] 1918

Now when I run spia, I get the following error:

> res <- spia( de=DE_gr_iii, all=ALL_gr_iii, organism="mmu", nB = 2000, plots=F, beta=NULL )
Error in spia(de = DE_gr_iii, all = ALL_gr_iii, organism = "mmu", nB = 2000,  :
  de must be a vector of log2 fold changes. The names of de should be
included in the refference array!

The DE_gr_iii is definitely the log2 fold change vector. I'm not sure
what is meant by the reference array since I don't see it in the SPIA
vignette, but I assume that the reference is either the data file
mmuSPIA.RData or the ALL vector.

I am not sure whether ALL should really contain all Entrez IDs from
the microarray, but I think not; I have tried also with all Entrez
IDs, and it did not work; I also used the Colorectal cancer data set
from the SPIA package only with first 100 values for the DE_Colorectal
and ALL_Colorectal vectors, and it run w/o problems.

The log fold changes were taken from a microarray experiment. I don't
think there is a problem with that because I tried also to fake the
values by taking them from the Colorectal cancer data provided with
SPIA.

I don't think that there is a problem with the length of the data. I
tried also another data set with 20,000 genes, and the error was the
same. Furthermore, I tried to run the Colorectal data set using only
first 100 values, and there were no problems running that.

The SPIA package seems to be correctly installed, because I can run
the example from the vignette without any problems.

The Entrez IDs that I used were derived from the Agilent annotation
package for this chip:

> a2sel$EID <- unlist( mget( as.character( a2sel$SCode ), mgug4122aENTREZID ) )
(a2sel is a data frame containing the fold changes, gene information
etc.; agilent identifiers are stored in the SCode column)
I removed any identifiers that were not mapped to Entrez:
> length( which( is.na( a2sel$EID ) ) )
[1] 7022
> a2sel <- a2sel[ !is.na( a2sel$EID ),]
> length( which( is.na( a2sel$EID ) ) )
[1] 0

The last hypothesis was that for whatever reason there is a problem
with Entrez IDs (that they do not match the IDs from the mmuSPIA.RData
file provided by the distribution). I tested this by using the
identifiers that are directly to be found in the mmuSPIA.RData pathway
info.

I loaded the pathway info from the mmuSPIA.RData file:
> load( file=paste( system.file( "extdata/mmuSPIA.RData", package="SPIA" ) ) )

I have chosen a pathway that contains several interactions of the type
"activation" and used the colnames and rownames of the matrix for my
ALL vector:
> all_ttt <- c( colnames( path.info[["04010"]]$activation ), rownames( path.info[["04010"]]$activation ) )
> length( all_ttt )
[1] 564

I generated some random fold changes:
> de_ttt <- runif( length( all_ttt ), -10, 10 )
> names( de_ttt ) <- all_ttt

The result was, again, error:

> res <- spia( de=de_ttt, all=all_ttt, organism="mmu", nB = 2000, plots=F, beta=NULL )
Error in spia(de = de_ttt, all = all_ttt, organism = "mmu", nB = 2000,
plots = F,  :
  de must be a vector of log2 fold changes. The names of de should be
included in the refference array!

I have no idea what the problem is.

Thanks in advance for any help -- maybe I should use another package?
I have lost two days on this problem already.

j.



P.S.
> sessionInfo()
R version 2.10.1 (2009-12-14)
i486-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] mgug4122a.db_2.3.6   SPIA_1.4.0           org.Mm.eg.db_2.3.6
 [4] BioIDMapper_2.1      gWidgetsRGtk2_0.0-65 gWidgets_0.0-41
 [7] lattice_0.18-3       XML_3.1-0            RCurl_1.4-2
[10] bitops_1.0-4.1       hgu95av2.db_2.3.5    org.Hs.eg.db_2.3.6
[13] GO.db_2.3.5          annotate_1.24.1      GOstats_2.12.0
[16] RSQLite_0.8-3        DBI_0.2-5            graph_1.26.0
[19] Category_2.12.1      AnnotationDbi_1.8.2  Biobase_2.6.1

loaded via a namespace (and not attached):
 [1] genefilter_1.24.3 grid_2.10.1       GSEABase_1.8.0    RBGL_1.24.0
 [5] RGtk2_2.12.15     splines_2.10.1    survival_2.35-8   tcltk_2.10.1
 [9] tools_2.10.1      xtable_1.5-6

-- 
-------- Dr. January Weiner 3 --------------------------------------
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin, Germany
Web   : www.mpiib-berlin.mpg.de
Tel     : +49-30-28460514



More information about the Bioconductor mailing list