[BioC] replicable of *PLGEM*

Pavelka, Norman NXP at stowers.org
Sat Oct 9 17:23:17 CEST 2010


Dear Guangchuang,

Sorry for the late reply but I was abroad on a long trip. I saw you posted this
question also to the bioc-devel mailing list, but I think your question is more
appropriate for the bioconductor users mailing list (CC'ed here).

I looked at your code and could not find any significant errors there. I think
the problem lies in your dataset itself. Below are a number of issues I can see:

1) First and most importantly, you have only 2 replicates per condition.
Although PLGEM is capable of dealing with such a dataset, it is far from being
an optimal case. You should try to have a least 3 or 4 replicates for at least
one of your experimental condition (e.g. the baseline condition).

2) Secondly there are only 802 proteins in your dataset. If you combine this
with the fact that you only have 2 replicates per condition, there are not many
combinations from which the package can resample from. In order to improve the
replicability between PLGEM runs, I suggest increasing the number of iterations
until the results are more stable. However, in your case, you should have much
better results by increasing the number of replicates (see point 1).

3) There are a number of warning messages that the PLGEM fitting step is
returning you. Although I don't have your data, I can image that in a typical
proteomics dataset there will be a large number of missing values which cause
problems in the PLGEM fitting. I strongly recommend using option trimAllZeroRows
=TRUE. This should make the warnings disappear, improve your fitting and thus
all downstream analysis.

Please try out my suggestions above and let me know how it works for you. I
realize these are proteomics-specific problems that are not discussed in detail
in the vignette. I will expand the discussion of such cases in future versions
of the vignette.

Thanks and good luck!
Norman

> From: guangchuang yu [guangchuangyu at gmail.com]
> Sent: Wednesday, September 29, 2010 2:59 AM
> To: Pavelka, Norman
> Subject: replicable of *PLGEM*
> 
> 
> Hi, Dr. Norman,
> 
> I am using *PLGEM* to detect DEG of my proteomic data sets which contain four
> cell cycle phase, and of each has two replication.
> 
> > CCeSet
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 802 features, 8 samples
>   element names: exprs
> protocolData: none
> phenoData
>   sampleNames: S1, G21, ..., G1E2  (8 total)
>   varLabels and varMetadata description:
>     condictionName: conditionName
> featureData: none
> experimentData: use 'experimentData(object)'
> Annotation:
> 
> I follow the guidelines of your package reference, and run the codes several
> times. Curiously, I found that each time *PLGEM* detect different proteins as
> differential expression.  Can you explain this ?
> 
> > CCfit <- plgem.fit(data=CCeSet, covariate=1, fitCondition="S", p=10, q=0.5,
> plot.file =FALSE, fittingEval = TRUE, verbose = TRUE)
> Fitting PLGEM...
> samples extracted for fitting:
>    condictionName
> S1              S
> S2              S
> determining modelling points...
> fitting data and modelling points...
> done with fitting PLGEM.
> 
> Warning messages:
> 1: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10,  :
>   PLGEM slope is higher than 1
> 2: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10,  :
>   Adjusted r^2 is lower than 0.95
> 3: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10,  :
>   Pearson correlation coefficient is lower than 0.85
> > ### computation of observed signal-to-noise ratios
> > CCobsStn <- plgem.obsStn(data = CCeSet, covariate = 1, baselineCondition =1
> ,plgemFit = CCfit, verbose = TRUE)
> calculating observed PLGEM-STN statistics:found 3 condition(s) to compare to
> the baseline.
> working on baseline S ...
> S1 S2
> working on condition G2 ...
> G21 G22
> working on condition M ...
> M1 M2
> working on condition G1 ...
> G1E1 G1E2
> done with calculating PLGEM-STN statistics.
> 
> > ## Computation of resampled signal-to-noise ratios
> > CCresampledStn <- plgem.resampledStn(data = CCeSet, plgemFit = CCfit,
> iterations = "automatic", verbose = TRUE)
> calculating resampled PLGEM-STN statistics:found 3 condition(s) to compare to
> the baseline.
> baseline samples:
> S1 S2
> resampling on samples:
> S1 S2
> Using 16 iterations...
> working on cases with 2 replicates...
>      Iterations:
> done with calculating resampled PLGEM-STN statistics.
> 
> > ## computation of p-value
> > CCpValues <- plgem.pValue(observedStn = CCobsStn, plgemResampledStn =
> CCresampledStn, verbose = TRUE)
> calculating PLGEM p-values... done.
> 
> > ## Detection of differentially expressed proteins (DEP)
> > CCdegList <- plgem.deg(observedStn = CCobsStn, plgemPval = CCpValues, delta
> = 0.001, verbose = TRUE)
> selecting significant DEG:found 3 condition(s) compared to the baseline.
> Delta =  0.001
>         Condition =  G2_vs_S
> delta: 0.001 condition: G2_vs_S found 12 DEG
>         Condition =  M_vs_S
> delta: 0.001 condition: M_vs_S found 34 DEG
>         Condition =  G1_vs_S
> delta: 0.001 condition: G1_vs_S found 71 DEG
> done with selecting significant DEG.
> 
> >
> 
> > CCfit <- plgem.fit(data=CCeSet, covariate=1, fitCondition="S", p=10, q=0.5,
> plot.file =FALSE, fittingEval = TRUE, verbose = TRUE)
> Fitting PLGEM...
> samples extracted for fitting:
>    condictionName
> S1              S
> S2              S
> determining modelling points...
> fitting data and modelling points...
> done with fitting PLGEM.
> 
> Warning messages:
> 1: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10,  :
>   PLGEM slope is higher than 1
> 2: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10,  :
>   Adjusted r^2 is lower than 0.95
> 3: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10,  :
>   Pearson correlation coefficient is lower than 0.85
> > ### computation of observed signal-to-noise ratios
> > CCobsStn <- plgem.obsStn(data = CCeSet, covariate = 1, baselineCondition =
> 1,plgemFit = CCfit, verbose = TRUE)
> calculating observed PLGEM-STN statistics:found 3 condition(s) to compare to
> the baseline.
> working on baseline S ...
> S1 S2
> working on condition G2 ...
> G21 G22
> working on condition M ...
> M1 M2
> working on condition G1 ...
> G1E1 G1E2
> done with calculating PLGEM-STN statistics.
> 
> > ## Computation of resampled signal-to-noise ratios
> > CCresampledStn <- plgem.resampledStn(data = CCeSet, plgemFit = CCfit,
> iterations = "automatic", verbose = TRUE)
> calculating resampled PLGEM-STN statistics:found 3 condition(s) to compare to
> the baseline.
> baseline samples:
> S1 S2
> resampling on samples:
> S1 S2
> Using 16 iterations...
> working on cases with 2 replicates...
>      Iterations:
> done with calculating resampled PLGEM-STN statistics.
> 
> > ## computation of p-value
> > CCpValues <- plgem.pValue(observedStn = CCobsStn, plgemResampledStn =
> CCresampledStn, verbose = TRUE)
> calculating PLGEM p-values... done.
> 
> > ## Detection of differentially expressed proteins (DEP)
> > CCdegList <- plgem.deg(observedStn = CCobsStn, plgemPval = CCpValues, delta
> = 0.001, verbose = TRUE)
> selecting significant DEG:found 3 condition(s) compared to the baseline.
> Delta =  0.001
>         Condition =  G2_vs_S
> delta: 0.001 condition: G2_vs_S found 778 DEG
>         Condition =  M_vs_S
> delta: 0.001 condition: M_vs_S found 790 DEG
>         Condition =  G1_vs_S
> delta: 0.001 condition: G1_vs_S found 793 DEG
> done with selecting significant DEG.
> 
> >
> > CCfit <- plgem.fit(data=CCeSet, covariate=1, fitCondition="S", p=10, q=0.5,
> plot.file =FALSE, fittingEval = TRUE, verbose = TRUE)
> Fitting PLGEM...
> samples extracted for fitting:
>    condictionName
> S1              S
> S2              S
> determining modelling points...
> fitting data and modelling points...
> done with fitting PLGEM.
> 
> Warning messages:
> 1: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10,  :
>   PLGEM slope is higher than 1
> 2: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10,  :
>   Adjusted r^2 is lower than 0.95
> 3: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10,  :
>   Pearson correlation coefficient is lower than 0.85
> > ### computation of observed signal-to-noise ratios
> > CCobsStn <- plgem.obsStn(data = CCeSet, covariate = 1, baselineCondition =
> 1,plgemFit = CCfit, verbose = TRUE)
> calculating observed PLGEM-STN statistics:found 3 condition(s) to compare to
> the baseline.
> working on baseline S ...
> S1 S2
> working on condition G2 ...
> G21 G22
> working on condition M ...
> M1 M2
> working on condition G1 ...
> G1E1 G1E2
> done with calculating PLGEM-STN statistics.
> 
> > ## Computation of resampled signal-to-noise ratios
> > CCresampledStn <- plgem.resampledStn(data = CCeSet, plgemFit = CCfit,
> iterations = "automatic", verbose = TRUE)
> calculating resampled PLGEM-STN statistics:found 3 condition(s) to compare to
> the baseline.
> baseline samples:
> S1 S2
> resampling on samples:
> S1 S2
> Using 16 iterations...
> working on cases with 2 replicates...
>      Iterations:
> done with calculating resampled PLGEM-STN statistics.
> 
> > ## computation of p-value
> > CCpValues <- plgem.pValue(observedStn = CCobsStn, plgemResampledStn =
> CCresampledStn, verbose = TRUE)
> calculating PLGEM p-values... done.
> 
> > ## Detection of differentially expressed proteins (DEP)
> > CCdegList <- plgem.deg(observedStn = CCobsStn, plgemPval = CCpValues, delta
> = 0.001, verbose = TRUE)
> selecting significant DEG:found 3 condition(s) compared to the baseline.
> Delta =  0.001
>         Condition =  G2_vs_S
> delta: 0.001 condition: G2_vs_S found 19 DEG
>         Condition =  M_vs_S
> delta: 0.001 condition: M_vs_S found 66 DEG
>         Condition =  G1_vs_S
> delta: 0.001 condition: G1_vs_S found 115 DEG
> done with selecting significant DEG.
> 
> 
> 
> 
> Guangchuang Yu
> --~--~---------~--~----~------------~-------~--~----~
> Institutes of Life & Health Engineering
> Jinan University, 601 Huangpu Ave. W.
> Guangzhou 510632,  P.R. China
> Tel: +86-20-85222677
> Email: guangchuangyu at gmail.com
> -~----------~----~----~----~------~----~------~--~---


More information about the Bioconductor mailing list