[BioC] dChip v li.wong() (Was: Re: warnings from li wong summary method in expresso)

Henrik Bengtsson hb at stat.berkeley.edu
Wed Jan 31 22:30:57 CET 2007


On 1/31/07, marco zucchelli <marco.bioc at gmail.com> wrote:
> sorry I put the wrong dChip table. the correct one is the following
>
>       T2-T1  T3-T2 T4-T3 T5-T4 T6-T5  counts
>  [1,]       0        0        0       0       0  47030
>  [2,]       0        0        0       0      -1   2120
>  [3,]       0        0        0       0       1   2096
>  [4,]       0        0       -1       0       0   1114
>  [5,]       0        0        1       0       0    721
>  [6,]       0        0        1       0      -1    381
>  [7,]       0        0       -1       0       1    301
>  [8,]       0        0        0       1       0    244
>  [9,]       0        0        0      -1       0    155
>
> There is large difference in patten 9 (double the counts in expresso) and in
> pattern 2 (pattern 3 in the expresso table) of about 20%.
>
> James,
>
>  I understand your point and I personally prefer open source software...
>
> I am using own arrays and arrays from public databases which have been
> analyzed with dChip. Since I use R and I found different results, I am
> trying to
> understand where the differences come from and if this can affect the
> biology.

You point is most important.  It is actually not quite the case that
dChip is a black box.  Under "Source code and command line version" on

  http://biosun1.harvard.edu/complab/dchip/install.htm

it says "the latest source codes of dChip are available on request [by
sending an email]".  Also, the source code for a version of dChip for
"Linux/MPI", which I assume has some in common with the Windows
version, is available for download (see link on the above page).

BTW, what kind of preprocessing do you apply in your comparison?  For
instance, both dChip and R/BioC provide quantile normalization but
they use totally different algorithms (and model assumptions).  To the
best of my understanding (from browsing the Linux dChip code), dChip
uses splines, whereas R/BioC uses sorting for quantile normalization,
which in practice means that dChip fits a smoother function and when
comparing empirical density functions of normalized probe signals they
will not be identical across arrays whereas the R/BioC normalized ones
will be.

/Henrik


>
> I thought someone else could be intrested in sharing this information, since
> other persons
> may have found themselves in the same situation.
>
> Best Regards and thanks for your time !
>
>
> Marco
>
>
>
>
> On 1/31/07, marco zucchelli <marco.bioc at gmail.com> wrote:
> >
> > Laurent,
> >
> >  according to affy vignette and MBEI paper of Li and Wong >10 arrays
> > should be enough.
> > Anyway I tried to process my arrays with dChip and there I get no warnings
> > at all.
> >
> > I also exported the expression values from dChip in a text file and loaded
> > them into R.
> >
> > Even if expression values cannot be compared I applied the same LIMMA
> > analysis to both the
> > expression values form expresso and dChip and the results are pretty
> > different. For example I tried 12 arrays in couples of 2 duplicates
> > (i.e. 6 tissues) and from LIMMA I got the following up-down regulated
> > patterns (1=up, -1=down).
> >
> > dCHIP
> >
> >             T2-T1  T3-T2 T4-T3 T5-T4 T6-T5  counts
> >  [1,]       0        0        0       0       0       51742
> >  [2,]       0        0        0       0       1       1199
> >  [3,]       0        0        0       0      -1        616
> >  [4,]       0        0       -1       0       0        607
> >  [5,]       0        0        1       0       0        201
> >  [6,]       0        0        1       0      -1        101
> >  [7,]       0        0        0       1      -1        71
> >  [8,]       0        0       -1       0       1        27
> >  [9,]       0        0        0       1       0        20
> > ....
> > ....
> >
> > expresso
> >
> >             T2-T1  T3-T2 T4-T3 T5-T4 T6-T5  counts
> >  [1,]       0        0        0       0       0       46464
> >  [2,]       0        0        0       0       1       2299
> >  [3,]       0        0        0       0      -1       1715
> >  [4,]       0        0       -1       0       0       1164
> >  [5,]       0        0        1       0       0       691
> >  [6,]       0        0        0       1       0       409
> >  [7,]       0        0       -1       0       1       344
> >  [8,]       0        0        1       0      -1       340
> >  [9,]       0        0        0      -1       0       316
> > ....
> > ....
> > ....
> >
> >
> > I would have expected smaller differences, or am I out fishing ?
> >
> > I wonder you or someone else on the mailing list has any experience of
> > this ...
> >
> >
> > Marco
> >
> >
> >
> >
> >
> >
> > On 1/31/07, lgautier at altern.org <lgautier at altern.org> wrote:
> > >
> > > Marco,
> > >
> > >
> > > If I remember correctly, dChip authors were talking more of having
> > > at least 25 chips. Obviously the number either 10, 16, 25 is not
> > > a hard threshold, as the convergence depends on the very numerical
> > > values.
> > >
> > > Knowing what are the probes failing is possible. A quick-but-dirty
> > > way is to only edit the function
> > > "generateExprVal.method.liwong ".
> > >
> > > Try:
> > > generateExprVal.method.liwong2 <- edit(generateExprVal.method.liwong)
> > >
> > > and edit the code as:
> > >
> > > probes <- t(probes)
> > > if (ncol(probes) == 1) {
> > >    warning("method liwong unsuitable when only one probe pair")
> > >    list(exprs=as.vector(probes),se.exprs=rep(NA,length(probes)))
> > > }
> > > else {
> > >    tmp <- fit.li.wong(probes, ...)
> > >    if ( !tmp$convergence1 & !tmp$convergence2) {
> > >        id <- get("id", envir= parent.frame(3))
> > >        print(id)
> > >    }
> > >    list(exprs=tmp$theta,se.exprs=tmp$sigma.theta)
> > > }
> > >
> > >
> > > (the only change is near the end).
> > >
> > > Now you can use the summary method "liwong2" in place of "liwong".
> > >
> > > You can hack this to your specific need (and want to store the 'id'
> > > into a variable in you global workspace for example).
> > >
> > >
> > >
> > > Hoping this helps,
> > >
> > >
> > > Laurent
> > >
> > >
> > >
> > >
> > >
> > >
> > > > James,
> > > >
> > > >  I am actually using 16 hgu133plus2 arrays, so I find it a little
> > > strange.
> > > >
> > > > Is there any way to know which probes failed (and how many totally) ?
> > > >
> > > > Does anybody know if dChip is still freely available? Seems like there
> > > is
> > > > no
> > > > link anymore
> > > > on the home page... I would like to cross check if the warnings are
> > > coming
> > > > up there as well
> > > >
> > > > Cherrs
> > > >
> > > > Marco
> > > >
> > > >
> > > > On 1/25/07, James W. MacDonald < jmacdon at med.umich.edu> wrote:
> > > >>
> > > >> Hi Marco,
> > > >>
> > > >> marco zucchelli wrote:
> > > >> > Hi,
> > > >> >
> > > >> >  I am using in R the dChip method to normalize and summarize my
> > > micro
> > > >> > arrays.
> > > >> > I tried several times and I always get warnings. What this does
> > > mean?
> > > >> Are
> > > >> > the expression levels
> > > >> > returned reliable anyway ?
> > > >>
> > > >> If you don't have enough samples, the LiWong method won't converge
> > > for
> > > >> some of your probesets. Lack of convergence is usually not a good
> > > thing.
> > > >> I think the recommendation for using LiWong is to have at least 10 or
> > > 15
> > > >> samples.
> > > >>
> > > >> You might consider using a different method to summarize your data.
> > > >>
> > > >> Best,
> > > >>
> > > >> Jim
> > > >>
> > > >>
> > > >> >
> > > >> > I use R2.4.1 on linux redhat
> > > >> >
> > > >> > Marco
> > > >> >
> > > >> >
> > > >> > eset <- expresso(hum.brain.embryo, bg.correct =
> > > >> > FALSE,normalize.method="invariantset",
> > > >> > pmcorrect.method = "pmonly",summary.method="liwong")
> > > >> > normalization: invariantset
> > > >> > PM/MM correction : pmonly
> > > >> > expression values: liwong
> > > >> > normalizing...done.
> > > >> > 54675 ids to be processed
> > > >> > |                    |
> > > >> > |####################|
> > > >> > There were 50 or more warnings (use warnings() to see the first 50)
> > >
> > > >> >
> > > >> > Warning messages:
> > > >> > 1: No convergence achieved in outlier loop
> > > >> >  in: fit.li.wong(probes, ...)
> > > >> > 2: No convergence achieved in outlier loop
> > > >> >  in: fit.li.wong(probes, ...)
> > > >> > 3: No convergence achieved in outlier loop
> > > >> >  in: fit.li.wong(probes, ...)
> > > >> > 4: No convergence achieved in outlier loop
> > > >> >  in: fit.li.wong(probes, ...)
> > > >> >
> > > >> >       [[alternative HTML version deleted]]
> > > >> >
> > > >> > _______________________________________________
> > > >> > Bioconductor mailing list
> > > >> > Bioconductor at stat.math.ethz.ch
> > > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >> > Search the archives:
> > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > > >>
> > > >>
> > > >> --
> > > >> James W. MacDonald, M.S.
> > > >> Biostatistician
> > > >> Affymetrix and cDNA Microarray Core
> > > >> University of Michigan Cancer Center
> > > >> 1500 E. Medical Center Drive
> > > >> 7410 CCGC
> > > >> Ann Arbor MI 48109
> > > >> 734-647-5623
> > > >>
> > > >>
> > > >> **********************************************************
> > > >> Electronic Mail is not secure, may not be read every day, and should
> > > not
> > > >> be used for urgent or sensitive issues.
> > > >>
> > > >
> > > >       [[alternative HTML version deleted]]
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at stat.math.ethz.ch
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > Search the archives:
> > > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > > >
> > > > !DSPAM:45bf8483227581042850563!
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list