[BioC] dChip v li.wong() (Was: Re: warnings from li wong summary method in expresso)

lgautier at altern.org lgautier at altern.org
Fri Feb 2 02:13:22 CET 2007


The implementation of the dChip-like pre-processing, split
into the "invariantset" (normalization) and "liwong" (probe
summary) methods, is indeed not-so-recent (more than 4 years,
if I remember it correctly) and further development in dChip
were not ported back to the affy package (AFAIK).

One note I would like to add is that although the authors
of dChip did not make the choice of releasing it as open source
software, they have been very helpful in answering questions
regarding their algorithms (my earliest memory on that is
from before bioconductor). They can probably answer better
than us on what has changed in dChip.

Regards,


Laurent


> Dear Henrik,
>
>  I  am using
> eset <- expresso(myaffy, bg.correct =
> FALSE,normalize.method="invariantset",
> pmcorrect.method ="pmonly",summary.method="liwong")
>
> I have downloaded the latest dChip (I think it is built in jan 2007).
>
> I  haven't been looking at the source code, just att the affy vignette
where
> it is claimed that this choice of the expresso options
> should mimic the MBEI method.
>
> But as Ben has posted, it might be that from the first time the MBEI
alg.
> was coded in R the dChip software may have been updated several times,
which
> can explain the difference in results.
>
>
> Regards
>
> Marco
>
>
>
>
>
>
>
>
>
> On 1/31/07, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
>> On 1/31/07, marco zucchelli <marco.bioc at gmail.com> wrote:
>> > sorry I put the wrong dChip table. the correct one is the following
>> >
>> >       T2-T1  T3-T2 T4-T3 T5-T4 T6-T5  counts
>> >  [1,]       0        0        0       0       0  47030
>> >  [2,]       0        0        0       0      -1   2120
>> >  [3,]       0        0        0       0       1   2096
>> >  [4,]       0        0       -1       0       0   1114
>> >  [5,]       0        0        1       0       0    721
>> >  [6,]       0        0        1       0      -1    381
>> >  [7,]       0        0       -1       0       1    301
>> >  [8,]       0        0        0       1       0    244
>> >  [9,]       0        0        0      -1       0    155
>> >
>> > There is large difference in patten 9 (double the counts in expresso)
>> and in
>> > pattern 2 (pattern 3 in the expresso table) of about 20%.
>> >
>> > James,
>> >
>> >  I understand your point and I personally prefer open source
>> software...
>> >
>> > I am using own arrays and arrays from public databases which have
been
>> > analyzed with dChip. Since I use R and I found different results, I
am
>> > trying to
>> > understand where the differences come from and if this can affect the
biology.
>> You point is most important.  It is actually not quite the case that
dChip is a black box.  Under "Source code and command line version" on
http://biosun1.harvard.edu/complab/dchip/install.htm
>> it says "the latest source codes of dChip are available on request [by
sending an email]".  Also, the source code for a version of dChip for
"Linux/MPI", which I assume has some in common with the Windows
version, is available for download (see link on the above page). BTW,
what kind of preprocessing do you apply in your comparison?  For
instance, both dChip and R/BioC provide quantile normalization but they
use totally different algorithms (and model assumptions).  To the best
of my understanding (from browsing the Linux dChip code), dChip uses
splines, whereas R/BioC uses sorting for quantile normalization, which
in practice means that dChip fits a smoother function and when
comparing empirical density functions of normalized probe signals they
will not be identical across arrays whereas the R/BioC normalized ones
will be.
>> /Henrik
>> >
>> > I thought someone else could be intrested in sharing this
information,
>> since
>> > other persons
>> > may have found themselves in the same situation.
>> >
>> > Best Regards and thanks for your time !
>> >
>> >
>> > Marco
>> >
>> >
>> >
>> >
>> > On 1/31/07, marco zucchelli <marco.bioc at gmail.com> wrote:
>> > >
>> > > Laurent,
>> > >
>> > >  according to affy vignette and MBEI paper of Li and Wong >10
arrays
>> > > should be enough.
>> > > Anyway I tried to process my arrays with dChip and there I get no
>> warnings
>> > > at all.
>> > >
>> > > I also exported the expression values from dChip in a text file and
>> loaded
>> > > them into R.
>> > >
>> > > Even if expression values cannot be compared I applied the same
>> LIMMA
>> > > analysis to both the
>> > > expression values form expresso and dChip and the results are
pretty
>> > > different. For example I tried 12 arrays in couples of 2 duplicates
(i.e. 6 tissues) and from LIMMA I got the following up-down
>> regulated
>> > > patterns (1=up, -1=down).
>> > >
>> > > dCHIP
>> > >
>> > >             T2-T1  T3-T2 T4-T3 T5-T4 T6-T5  counts
>> > >  [1,]       0        0        0       0       0       51742 [2,]   
   0        0        0       0       1       1199
>> > >  [3,]       0        0        0       0      -1        616
>> > >  [4,]       0        0       -1       0       0        607
>> > >  [5,]       0        0        1       0       0        201
>> > >  [6,]       0        0        1       0      -1        101
>> > >  [7,]       0        0        0       1      -1        71
>> > >  [8,]       0        0       -1       0       1        27
>> > >  [9,]       0        0        0       1       0        20
>> > > ....
>> > > ....
>> > >
>> > > expresso
>> > >
>> > >             T2-T1  T3-T2 T4-T3 T5-T4 T6-T5  counts
>> > >  [1,]       0        0        0       0       0       46464 [2,]   
   0        0        0       0       1       2299
>> > >  [3,]       0        0        0       0      -1       1715
>> > >  [4,]       0        0       -1       0       0       1164
>> > >  [5,]       0        0        1       0       0       691
>> > >  [6,]       0        0        0       1       0       409
>> > >  [7,]       0        0       -1       0       1       344
>> > >  [8,]       0        0        1       0      -1       340
>> > >  [9,]       0        0        0      -1       0       316
>> > > ....
>> > > ....
>> > > ....
>> > >
>> > >
>> > > I would have expected smaller differences, or am I out fishing ?
>> > >
>> > > I wonder you or someone else on the mailing list has any experience
>> of
>> > > this ...
>> > >
>> > >
>> > > Marco
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On 1/31/07, lgautier at altern.org <lgautier at altern.org> wrote:
>> > > >
>> > > > Marco,
>> > > >
>> > > >
>> > > > If I remember correctly, dChip authors were talking more of
having
>> > > > at least 25 chips. Obviously the number either 10, 16, 25 is not
a hard threshold, as the convergence depends on the very
numerical
>> > > > values.
>> > > >
>> > > > Knowing what are the probes failing is possible. A
quick-but-dirty
>> > > > way is to only edit the function
>> > > > "generateExprVal.method.liwong ".
>> > > >
>> > > > Try:
>> > > > generateExprVal.method.liwong2 <-
>> edit(generateExprVal.method.liwong
>> )
>> > > >
>> > > > and edit the code as:
>> > > >
>> > > > probes <- t(probes)
>> > > > if (ncol(probes) == 1) {
>> > > >    warning("method liwong unsuitable when only one probe pair")
list(exprs=as.vector(probes),se.exprs=rep(NA,length(probes)))
>> > > > }
>> > > > else {
>> > > >    tmp <- fit.li.wong(probes, ...)
>> > > >    if ( !tmp$convergence1 & !tmp$convergence2) {
>> > > >        id <- get("id", envir= parent.frame(3))
>> > > >        print(id)
>> > > >    }
>> > > >    list(exprs=tmp$theta,se.exprs=tmp$sigma.theta)
>> > > > }
>> > > >
>> > > >
>> > > > (the only change is near the end).
>> > > >
>> > > > Now you can use the summary method "liwong2" in place of
"liwong".
>> > > >
>> > > > You can hack this to your specific need (and want to store the
>> 'id'
>> > > > into a variable in you global workspace for example).
>> > > >
>> > > >
>> > > >
>> > > > Hoping this helps,
>> > > >
>> > > >
>> > > > Laurent
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > > James,
>> > > > >
>> > > > >  I am actually using 16 hgu133plus2 arrays, so I find it a
>> little
>> > > > strange.
>> > > > >
>> > > > > Is there any way to know which probes failed (and how many
>> totally) ?
>> > > > >
>> > > > > Does anybody know if dChip is still freely available? Seems
like
>> there
>> > > > is
>> > > > > no
>> > > > > link anymore
>> > > > > on the home page... I would like to cross check if the warnings
>> are
>> > > > coming
>> > > > > up there as well
>> > > > >
>> > > > > Cherrs
>> > > > >
>> > > > > Marco
>> > > > >
>> > > > >
>> > > > > On 1/25/07, James W. MacDonald < jmacdon at med.umich.edu> wrote:
>> > > > >>
>> > > > >> Hi Marco,
>> > > > >>
>> > > > >> marco zucchelli wrote:
>> > > > >> > Hi,
>> > > > >> >
>> > > > >> >  I am using in R the dChip method to normalize and summarize
>> my
>> > > > micro
>> > > > >> > arrays.
>> > > > >> > I tried several times and I always get warnings. What this
>> does
>> > > > mean?
>> > > > >> Are
>> > > > >> > the expression levels
>> > > > >> > returned reliable anyway ?
>> > > > >>
>> > > > >> If you don't have enough samples, the LiWong method won't
>> converge
>> > > > for
>> > > > >> some of your probesets. Lack of convergence is usually not a
>> good
>> > > > thing.
>> > > > >> I think the recommendation for using LiWong is to have at
least
>> 10 or
>> > > > 15
>> > > > >> samples.
>> > > > >>
>> > > > >> You might consider using a different method to summarize your
>> data.
>> > > > >>
>> > > > >> Best,
>> > > > >>
>> > > > >> Jim
>> > > > >>
>> > > > >>
>> > > > >> >
>> > > > >> > I use R2.4.1 on linux redhat
>> > > > >> >
>> > > > >> > Marco
>> > > > >> >
>> > > > >> >
>> > > > >> > eset <- expresso(hum.brain.embryo, bg.correct =
>> > > > >> > FALSE,normalize.method="invariantset",
>> > > > >> > pmcorrect.method = "pmonly",summary.method="liwong")
normalization: invariantset
>> > > > >> > PM/MM correction : pmonly
>> > > > >> > expression values: liwong
>> > > > >> > normalizing...done.
>> > > > >> > 54675 ids to be processed
>> > > > >> > |                    |
>> > > > >> > |####################|
>> > > > >> > There were 50 or more warnings (use warnings() to see the
>> first
>> 50)
>> > > >
>> > > > >> >
>> > > > >> > Warning messages:
>> > > > >> > 1: No convergence achieved in outlier loop
>> > > > >> >  in: fit.li.wong(probes, ...)
>> > > > >> > 2: No convergence achieved in outlier loop
>> > > > >> >  in: fit.li.wong(probes, ...)
>> > > > >> > 3: No convergence achieved in outlier loop
>> > > > >> >  in: fit.li.wong(probes, ...)
>> > > > >> > 4: No convergence achieved in outlier loop
>> > > > >> >  in: fit.li.wong(probes, ...)
>> > > > >> >
>> > > > >> >       [[alternative HTML version deleted]]
>> > > > >> >
>> > > > >> > _______________________________________________
>> > > > >> > Bioconductor mailing list
>> > > > >> > Bioconductor at stat.math.ethz.ch
>> > > > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > > > >> > Search the archives:
>> > > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> James W. MacDonald, M.S.
>> > > > >> Biostatistician
>> > > > >> Affymetrix and cDNA Microarray Core
>> > > > >> University of Michigan Cancer Center
>> > > > >> 1500 E. Medical Center Drive
>> > > > >> 7410 CCGC
>> > > > >> Ann Arbor MI 48109
>> > > > >> 734-647-5623
>> > > > >>
>> > > > >>
>> > > > >> **********************************************************
Electronic Mail is not secure, may not be read every day, and
>> should
>> > > > not
>> > > > >> be used for urgent or sensitive issues.
>> > > > >>
>> > > > >
>> > > > >       [[alternative HTML version deleted]]
>> > > > >
>> > > > > _______________________________________________
>> > > > > Bioconductor mailing list
>> > > > > Bioconductor at stat.math.ethz.ch
>> > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > > > > Search the archives:
>> > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> !DSPAM:45c1c5e550621042850563!
>
>
>



More information about the Bioconductor mailing list