[BioC] Help on PLGEM R Package Usage

Tue Sep 27 04:02:19 CEST 2011

Dear Norman,

You are right, the raw SC is rather small. The attachment is SC data from
one run.
Besides, I'm sorry to make you have the delusion that I'm comparing PLGEM
with t test, I never meant to do so. I'm trying to compare different NSAF
dataset handling protocols using PLGEM as a benchmark. 
Hope you have a good day. Thanks very much for your help.

Regards,
Qi Wu

-----Original Message-----
From: Norman Pavelka [mailto:normanpavelka at gmail.com] 
Sent: Monday, September 26, 2011 2:36 PM
To: Wu Qi
Cc: bioconductor at r-project.org
Subject: Re: Help on PLGEM R Package Usage

Hi Qi,

If the model does not fit the data, there is no justification to use the
model, hence results cannot be trusted. I wonder why this is happening,
though, as this is the first time I see it. Could you please look at the raw
spectral count data of this dataset? I suspect that the runs only returned a
few spectra per protein. This would explain the low dynamic range of the
NSAF values and the bad fit of the PLGEM.

On a separate note, I'm not sure I agree in your strategy "to illustrate one
method outperforms another because of its larger DEG list". Are you
referring to DEG identification methods (e.g. t-test vs. plgem)? In that
case, a larger number of identified DEG does not necessarily mean a better
method. The DEG selection method could be selecting more false positives. A
better way to compare two methods is against a benchmark dataset for which
the true positives are known, and comparing the false positive rate and
false negative rate by means e.g. of ROC curves.

HTH,
Norman

On Sun, Sep 25, 2011 at 7:48 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
> Dear Norman,
>
> If the parameters(slope, r^2 and Pearson correlation coefficients ) 
> look terrible, does this mean the DEG list I got cannot be trusted?
> So can I compare two DEG lists with very different parameters? My 
> point is to illustrate one method outperforms another because of its 
> larger DEG list, but the parameters of  these two datasets vary a lot.
> Thanks for your help.
>
> Regards,
> Qi Wu
>
> -----Original Message-----
> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
> Sent: Saturday, September 24, 2011 11:39 PM
> To: Wu Qi
> Cc: bioconductor at r-project.org
> Subject: Re: Help on PLGEM R Package Usage
>
> You will have to set plotFile=FALSE if you want to override the 
> default png file.
>
> Also, given the relatively small dataset you are using (~500 
> proteins), I recommend increasing the number of iterations of the 
> permutation step. The default Iterations="automatic" only uses 500
iterations in your case.
> However I would suggest setting it to at least 1000 or even more. This 
> will make p-values more stable from run to run. I don't know if you 
> noticed, but each time you run PLGEM you get slightly different 
> p-values. This is because the permutation step is based on random 
> resampling of your data and could be different from run to run. Using 
> a larger number of iterations stabilizes the empirical distribution of 
> resampled STN ratios, and makes p-values more stable.
>
> That said, if your data do not fit well to the PLGEM, then there is 
> little chance you can improve the results by tweaking these other
parameters.
>
> Hope this helps!
> Norman
>
> On Sat, Sep 24, 2011 at 4:19 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
>> Dear Norman,
>>
>> The dataset is downloaded from Tranche website
>> https://proteomecommons.org/dataset.jsp?!=73694 . I haven't gone 
>> through the experimental details yet.
>> When I try to produce high quality figures following your 
>> instructions, I get a plot whose parameters are quite different using 
>> following commands, I guess this plot is generated with default
arguments:
>>
>> NSAFSet<-readExpressionSet("exprs_NSAF.txt","phenoDataFile.txt")
>> pdf()
>> NSAFdegList<-run.plgem(NSAFSet, signLev=0.01, rank=100, covariate=1, 
>> baselineCondition="E", Iterations="automatic", trimAllZeroRows=TRUE, 
>> zeroMeanOrSD="trim", fitting.eval=TRUE, plotFile=TRUE, 
>> writeFiles=FALSE,
>> Verbose=TRUE)
>> dev.off()
>>
>> By these commands, I could still only get a fittingEval.png which is 
>> very small. How can I write fittingEval plot generated with my own 
>> arguments to other file formats?
>>
>>
>> -----Original Message-----
>> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
>> Sent: Saturday, September 24, 2011 1:23 AM
>> To: Wu Qi
>> Cc: bioconductor at r-project.org
>> Subject: Re: Help on PLGEM R Package Usage
>>
>> Dear Qi,
>>
>> Thank you for the data and the plots. I think the problem might 
>> reside in your data. If you do a boxplot of your data you will notice 
>> that they do not span many orders of magnitude. Here's how you can 
>> see for
>> yourself:
>>
>> test <- log10(exprs(NSAFSet))  # log-transform your data test[test == 
>> -Inf] <- NA     # to remove -Inf values coming from log10(0)
>> boxplot(test)
>>
>> PLGEM fits best when data span several orders of magnitude, whereas 
>> in your case the NSAF values only span two orders of magnitude. May I 
>> ask you which proteomics technology you used to generate these data? 
>> Is this a whole-cell extract or a subproteome?
>>
>> Cheers,
>> Norman
>>
>> On Sat, Sep 24, 2011 at 12:02 AM, Wu Qi <qwu at dicp.ac.cn> wrote:
>>> Dear Norman,
>>>
>>> Thanks for your quick response, please find my attached files and plot.
>>> I really don't understand how to optimize the arguments for every 
>>> step and I have more than one dataset which also need evaluation. So 
>>> could you possibly give me some advice on choosing arguments?
>>> The commands for generating this plot is as follows:
>>>
>>> library(plgem)
>>>
>>> NSAFSet<-readExpressionSet("exprs_NSAF.txt","phenoDataFile.txt")
>>>
>>> NSAFdegList<-run.plgem(NSAFSet, signLev=0.01, rank=100, covariate=1, 
>>> baselineCondition="E", Iterations="automatic", trimAllZeroRows=TRUE, 
>>> zeroMeanOrSD="trim", fitting.eval=TRUE, plotFile=TRUE, 
>>> writeFiles=FALSE,
>>> Verbose=TRUE)
>>>
>>> plgem.write.summary(NSAFdegList, prefix="NSAF", verbose=TRUE)
>>>
>>> Kind Regards,
>>> Qi Wu
>>>
>>> -----Original Message-----
>>> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
>>> Sent: Friday, September 23, 2011 11:38 PM
>>> To: Wu Qi
>>> Cc: bioconductor at r-project.org
>>> Subject: Re: Help on PLGEM R Package Usage
>>>
>>> Hi Qi,
>>>
>>> These fitting values look very outside the optimal range. Do you 
>>> actually get a straight line in the ln(sd) vs. ln(mean) plot? If 
>>> not, something might be wrong about how the data were normalized. 
>>> You may e-mail me offline your data and/or the fitting evaluation 
>>> plots and I might be able to diagnose the problem.
>>>
>>> The slope is one of the most important parameters to look at, and it 
>>> usually should be between 0.5 and 1. The r^2 and Pearson correlation 
>>> coefficients should be as close to 1 as possible.
>>>
>>> In order to capture the plots in another file format you can call
>>> pdf() prior to run.plgem() to generate a high-quality 
>>> vector-graphics PDF file. Example:
>>>
>>> library(plgem)
>>> data(LPSeset)
>>> pdf()      # this will open a new PDF file called 'Rplots.pdf'
>>>           # in your current working directory plgemOutput <-
>>> run.plgem(LPSeset)
>>> dev.off()  # this will close the PDF file
>>>
>>> Instead of pdf() above you can try bmp(), jpeg(), tiff() or 
>>> virtually any other major image file format. Under Windows there is 
>>> also
>>> win.metafile() that generates EMF image file format.
>>>
>>> Hope this helps!
>>> Norman
>>>
>>> On Fri, Sep 23, 2011 at 11:06 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
>>>> Dear Norman,
>>>>
>>>>
>>>>
>>>> Thanks for your further advice.
>>>>
>>>> After applying the arguements you recommend, The parameters for my 
>>>> NSAF dataset are: slope=0.291, intercept=-5.35, adj.r2=0.636, 
>>>> Pearson=0.464. Are they horrible?
>>>>
>>>> Could you tell me which is the most important parameter to assess 
>>>> my dataset quality?
>>>>
>>>> And how can I export high quality figure (emf format) with these
>>> parameters?
>>>> I could only find it in the simplest wrapper mode. When I append 
>>>> "plotFile=TRUE" in run.plgem function, I could only get a png 
>>>> figure whose resolution is really poor.
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>> Qi Wu
>>>
>>
>>
>
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sample SC raw data.txt
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20110927/0aeb403b/attachment.txt>