[BioC] marrayLayout difficulties

Fri Oct 1 18:32:46 CEST 2004

Hi, Jean -

One of your suggestions solved the problem, apparently.  I'll include
details here for the sake of the archive.

I had previously constructed my marrayLayout object as follows:

> dat <- read.table('13998GENEPIX13998.txt', header = TRUE)
> # Construct maSub (1 for printed spots, 0 for missing spots)
> seq <- c(1:31744)
> int <- intersect(seq, as.numeric(dat[,1]))
> sub <- rep(0, 31744)
> sub[int] <- 1
> # Construct marrayLayout object.
> ml <- new("marrayLayout", maNgr = 8, maNgc = 4, maNsr = 31, maNsc = 32,
+           maNspots = 31744)
> maSub(ml) <- sub
> maPlate(ml) <- as.factor(dat[,5])

After this, I get

> table(ml at maSub)

FALSE  TRUE 
31743     1 

On Jean's advice, I instead tried

> maSub(ml) <- as.logical(sub)
> table(ml at maSub)

FALSE  TRUE 
  256 31488 

This preserves the correct maSub vector.  Performance is now good (~1
minute to normalize), and the results are identical to my previous results
with post-setter modification of maSub.  I think I'm comfortable assuming
that the normalization is correct, since the MA plot looks correct and
performance is within reasonable limits.

Does this indicate a problem with the numerical form of the maSub slot
assignment method?  Or did I mis-use it?

Many thanks for your help,

--
Jeremy Gollub, Ph.D.
jgollub at genome.stanford.edu
(W) 650/736-0075

On Fri, 1 Oct 2004, Jean Yee Hwa Yang wrote:

> Hi Jeremy,
> 
> That sounds very slow from my experience.  Which image analysis software
> did you get your data from?  If you send me an example file off-line, I
> will take a look at it for you, I need to take a look to see if maSub was
> set properly, as this does make a big different in print-tip
> normalization.
>  
> Alternatively, try the latest verion 1.5.17 that is temporary place at
> http://arrays.ucsf.edu/software/
> 
> maNorm was previously very slow for global lowess normalization for larget
> number of spots but in the new version, we have speed up the code with
> sampling.  However, I don't think this was your problem.
> 
> I will also suggest trying the swirl data within the marray package and
> see how long that take on yoru computer
> 
> data(swirl)
> norm <- maNorm(swirl)
> 
> If that takes a min or so that there is something wrong with your data
> setup.
> 
> Cheers
> 
> Jean
> 
> 
> On Thu, 30 Sep 2004, Jeremy Gollub wrote:
> 
> > Hi, all -
> > 
> > I'm experiencing very poor performance using the marray package (20
> > minutes to normalize a single <32,000 spot microarray).  Can someone
> > tell me whether this is normal, or what I'm doing wrong?
> > 
> > In the process of hunting down some errors, I also noticed some odd (to
> > me) behavior in the marrayLayout maSub slot assignment method, described
> > below.  An attempt to "correct" this results in a much faster
> > normalization (~1 minute), which looks good according to the MA plot
> > but produces different numbers in maM than the slower calculation.
> > 
> > It seems unlikely that either result is correct (I can choose between
> > suspiciously bad performance, or messing with the marrayLayout object's
> > internals).
> > 
> > Thanks for any suggestions - details follow.
> > 
> > I'm using R version 1.9.0 on a sparc system running Solaris 2.9.  My
> > marray version is 1.5.14.
> > 
> > I have a text file, "dat.txt," containing the data I want to normalize.
> > 10 columns, all numeric: in order,
> > 	FEATURE		spot number 1 - 31736
> > 	SECTOR		unnecessary and unused
> > 	ROW		"
> > 	COL		"
> > 	PLATE		ID of printing plate
> > 	Gf		green channel foreground
> > 	Rf		red channel foreground
> > 	Gb		green channel background
> > 	Rb		red channel background
> > 	W		spot weights, either 0 or 1
> > 
> > Array parameters are: Ngr = 8, Ngc = 4, Nsr = 31, Nsc = 32, Nspots =
> > 31744.  Not all spots are printed (ragged ends to each block).  Only
> > printed spots are included in the data file, so there are gaps in the
> > FEATURE column sequence but no blank lines in the file.
> > 
> > The session:
> > 
> > > library(marray)
> > >
> > > # Read file.
> > >
> > > dat <- read.table('dat.txt', header = TRUE)
> > > 
> > > # Construct maSub: 1 for each printed spot, 0 for absent spots.
> > >
> > > seq <- c(1:31744)
> > > int <- intersect(seq, as.numeric(dat[,1]))
> > > sub <- rep(0, 31744)
> > > sub[int] <- 1
> > > 
> > > # Note contents of sub around the end of the first block and beginning
> > > # of the second:
> > >
> > > print(sub[980:1000])
> > [1] 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
> > > # total of 31488 present spots
> > > sum(sub)
> > [1] 31488
> > > 
> > > # Construct marrayLayout object.
> > >
> > > ml <- new("marrayLayout", maNgr = 8, maNgc = 4, maNsr = 31, maNsc = 32,
> > +           maNspots = 31744)
> > > maSub(ml) <- sub
> > > maPlate(ml) <- as.factor(dat[,5])
> > > 
> > > # Note contents of maSub:
> > >
> > > sum(ml at maSub)
> > [1] 1
> > > length(ml at maSub)
> > [1] 31744
> > > print(sub[1:20])
> >  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> > > print(ml at maSub[1:20])
> >  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> > FALSE
> > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> > > print(ml at maSub[980:1000])
> >  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> > FALSE
> > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> > > 
> > > # Now meddle with ml at maSub (set it back the way I think it should be).
> > > # Or don't - see comment on maNormMain step, below.
> > >
> > > maSub(ml)[int] <- TRUE
> > > 
> > > # construct marrayRaw object.
> > >
> > > mr <- new("marrayRaw",
> > +         maGf = matrix(dat[,6], ncol = 1),
> > +         maRf = matrix(dat[,7], ncol = 1),
> > +         maGb = matrix(dat[,8], ncol = 1),
> > +         maRb = matrix(dat[,9], ncol = 1),
> > +         maW =  matrix(dat[,10], ncol = 1),
> > +         maLayout = ml)
> > >
> > > # This step takes about one minute if I do maSub(ml)[int] <- TRUE
> > > # as indicated above.  If I don't, it takes about 20 minutes.
> > > # The results differ, although the MA plot looks normalized either way.
> > >
> > > mn <- maNormMain(mr, f.loc = list(maNormLoess(x="maA", y="maM",
> > +                         z="maPrintTip", w=NULL, subset=TRUE, span =
> > 0.4)),
> > +                  f.scale = list(maNormMAD(x = "maPrintTip", y = "maM",
> > +                         geo = FALSE, subset = TRUE)),
> > +                 Mloc = TRUE, Mscale = TRUE)
> > 
> > --
> > Jeremy Gollub, Ph.D.
> > jgollub at genome.stanford.edu
> > (W) 650/736-0075
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > 
>