[BioC] marrayLayout difficulties
Jeremy Gollub
jgollub at genome.stanford.edu
Fri Oct 1 18:32:46 CEST 2004
Hi, Jean -
One of your suggestions solved the problem, apparently. I'll include
details here for the sake of the archive.
I had previously constructed my marrayLayout object as follows:
> dat <- read.table('13998GENEPIX13998.txt', header = TRUE)
> # Construct maSub (1 for printed spots, 0 for missing spots)
> seq <- c(1:31744)
> int <- intersect(seq, as.numeric(dat[,1]))
> sub <- rep(0, 31744)
> sub[int] <- 1
> # Construct marrayLayout object.
> ml <- new("marrayLayout", maNgr = 8, maNgc = 4, maNsr = 31, maNsc = 32,
+ maNspots = 31744)
> maSub(ml) <- sub
> maPlate(ml) <- as.factor(dat[,5])
After this, I get
> table(ml at maSub)
FALSE TRUE
31743 1
On Jean's advice, I instead tried
> maSub(ml) <- as.logical(sub)
> table(ml at maSub)
FALSE TRUE
256 31488
This preserves the correct maSub vector. Performance is now good (~1
minute to normalize), and the results are identical to my previous results
with post-setter modification of maSub. I think I'm comfortable assuming
that the normalization is correct, since the MA plot looks correct and
performance is within reasonable limits.
Does this indicate a problem with the numerical form of the maSub slot
assignment method? Or did I mis-use it?
Many thanks for your help,
--
Jeremy Gollub, Ph.D.
jgollub at genome.stanford.edu
(W) 650/736-0075
On Fri, 1 Oct 2004, Jean Yee Hwa Yang wrote:
> Hi Jeremy,
>
> That sounds very slow from my experience. Which image analysis software
> did you get your data from? If you send me an example file off-line, I
> will take a look at it for you, I need to take a look to see if maSub was
> set properly, as this does make a big different in print-tip
> normalization.
>
> Alternatively, try the latest verion 1.5.17 that is temporary place at
> http://arrays.ucsf.edu/software/
>
> maNorm was previously very slow for global lowess normalization for larget
> number of spots but in the new version, we have speed up the code with
> sampling. However, I don't think this was your problem.
>
> I will also suggest trying the swirl data within the marray package and
> see how long that take on yoru computer
>
> data(swirl)
> norm <- maNorm(swirl)
>
> If that takes a min or so that there is something wrong with your data
> setup.
>
> Cheers
>
> Jean
>
>
> On Thu, 30 Sep 2004, Jeremy Gollub wrote:
>
> > Hi, all -
> >
> > I'm experiencing very poor performance using the marray package (20
> > minutes to normalize a single <32,000 spot microarray). Can someone
> > tell me whether this is normal, or what I'm doing wrong?
> >
> > In the process of hunting down some errors, I also noticed some odd (to
> > me) behavior in the marrayLayout maSub slot assignment method, described
> > below. An attempt to "correct" this results in a much faster
> > normalization (~1 minute), which looks good according to the MA plot
> > but produces different numbers in maM than the slower calculation.
> >
> > It seems unlikely that either result is correct (I can choose between
> > suspiciously bad performance, or messing with the marrayLayout object's
> > internals).
> >
> > Thanks for any suggestions - details follow.
> >
> > I'm using R version 1.9.0 on a sparc system running Solaris 2.9. My
> > marray version is 1.5.14.
> >
> > I have a text file, "dat.txt," containing the data I want to normalize.
> > 10 columns, all numeric: in order,
> > FEATURE spot number 1 - 31736
> > SECTOR unnecessary and unused
> > ROW "
> > COL "
> > PLATE ID of printing plate
> > Gf green channel foreground
> > Rf red channel foreground
> > Gb green channel background
> > Rb red channel background
> > W spot weights, either 0 or 1
> >
> > Array parameters are: Ngr = 8, Ngc = 4, Nsr = 31, Nsc = 32, Nspots =
> > 31744. Not all spots are printed (ragged ends to each block). Only
> > printed spots are included in the data file, so there are gaps in the
> > FEATURE column sequence but no blank lines in the file.
> >
> > The session:
> >
> > > library(marray)
> > >
> > > # Read file.
> > >
> > > dat <- read.table('dat.txt', header = TRUE)
> > >
> > > # Construct maSub: 1 for each printed spot, 0 for absent spots.
> > >
> > > seq <- c(1:31744)
> > > int <- intersect(seq, as.numeric(dat[,1]))
> > > sub <- rep(0, 31744)
> > > sub[int] <- 1
> > >
> > > # Note contents of sub around the end of the first block and beginning
> > > # of the second:
> > >
> > > print(sub[980:1000])
> > [1] 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
> > > # total of 31488 present spots
> > > sum(sub)
> > [1] 31488
> > >
> > > # Construct marrayLayout object.
> > >
> > > ml <- new("marrayLayout", maNgr = 8, maNgc = 4, maNsr = 31, maNsc = 32,
> > + maNspots = 31744)
> > > maSub(ml) <- sub
> > > maPlate(ml) <- as.factor(dat[,5])
> > >
> > > # Note contents of maSub:
> > >
> > > sum(ml at maSub)
> > [1] 1
> > > length(ml at maSub)
> > [1] 31744
> > > print(sub[1:20])
> > [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> > > print(ml at maSub[1:20])
> > [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> > FALSE
> > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> > > print(ml at maSub[980:1000])
> > [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> > FALSE
> > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> > >
> > > # Now meddle with ml at maSub (set it back the way I think it should be).
> > > # Or don't - see comment on maNormMain step, below.
> > >
> > > maSub(ml)[int] <- TRUE
> > >
> > > # construct marrayRaw object.
> > >
> > > mr <- new("marrayRaw",
> > + maGf = matrix(dat[,6], ncol = 1),
> > + maRf = matrix(dat[,7], ncol = 1),
> > + maGb = matrix(dat[,8], ncol = 1),
> > + maRb = matrix(dat[,9], ncol = 1),
> > + maW = matrix(dat[,10], ncol = 1),
> > + maLayout = ml)
> > >
> > > # This step takes about one minute if I do maSub(ml)[int] <- TRUE
> > > # as indicated above. If I don't, it takes about 20 minutes.
> > > # The results differ, although the MA plot looks normalized either way.
> > >
> > > mn <- maNormMain(mr, f.loc = list(maNormLoess(x="maA", y="maM",
> > + z="maPrintTip", w=NULL, subset=TRUE, span =
> > 0.4)),
> > + f.scale = list(maNormMAD(x = "maPrintTip", y = "maM",
> > + geo = FALSE, subset = TRUE)),
> > + Mloc = TRUE, Mscale = TRUE)
> >
> > --
> > Jeremy Gollub, Ph.D.
> > jgollub at genome.stanford.edu
> > (W) 650/736-0075
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
More information about the Bioconductor
mailing list