[BioC] About subsampling of VST in lumi

Pan Du dupan at northwestern.edu
Fri Dec 14 23:25:09 CET 2007


Thanks! Ligia. 
I will make the change. Probably, we will just remove the sub-sampling step
by default.

Have a nice weekend,


Pan 


On 12/14/07 3:56 PM, "ligia at ebi.ac.uk" <ligia at ebi.ac.uk> wrote:

> Hi Pan,
> 
> Thanks for your email.
> The problem I reported is not due to the downsampling step controlled via
> "nSupport" parameter, but with a subsequent step in "vst" where if the
> number of selected probes with high variance (indSel) is above 5000, then
> only a random subset (5000) of these probes is used (the steps I mentioned
> in my last email) to fit the linear model between variance and mean of
> probe beads. Couldn't this value (5000) be just another parameter to
> "vst"?
> 
> Thanks for your help,
> Ligia
> 
> 
> 
>> Hi Ligia,
>> 
>> Thanks for your report.
>> Yes, we use down-sampling to speed up the parameter estimation. If you
>> want
>> to use all the data points, you can set the parameter "nSupport" of vst
>> function as the length of the vector. I will add this to the vignette or
>> help file. Thanks!
>> 
>> 
>> Pan
>> 
>> 
>> On 12/14/07 5:18 AM, "ligia at ebi.ac.uk" <ligia at ebi.ac.uk> wrote:
>> 
>>> Dear Pan Du,
>>> 
>>>> From what I understand when looking at "vst", the random subsampling
>>>> that
>>> affects my data occurs at step 4 below:
>>> 
>>> 1       if (c3 != 0) {
>>> 2            selInd <- selInd & (std^2 > c3)
>>> 3            dd <- data.frame(y = sqrt(std[selInd]^2 - c3), x1 =
>>> u[selInd])
>>> 4            if (nrow(dd) > 5000   dd <- dd[sample(1:nrow(dd), 5000), ]
>>> 5            lmm <- lm(y ~ x1, dd)
>>> 6            c1 <- lmm$coef[2]
>>> 7            c2 <- lmm$coef[1]
>>> 8        }
>>> 
>>> because my "dd" matrix has around 5500 rows. Maybe it would be nice to
>>> have the option to turn this off, or add the option to provide the max
>>> value allowed for nrow(dd)...
>>> 
>>> Cheers,
>>> Lígia
>>> 
>>> 
>>>> Dear Ligia
>>>> 
>>>> I believe this is because they random subsample the data to "speed
>>>> processing", see the man page and the  nSupport parameter.
>>>> 
>>>> I cc Pan Du with the suggestion to make the explanation of this in the
>>>> man page more clear. Is there an option to switch off the random
>>>> subsampling?
>>>> 
>>>>   Best wishes
>>>> Wolfgang
>>>> 
>>>> 
>>>> 
>>>> ligia at ebi.ac.uk ha scritto:
>>>>> Hi Wolfgang,
>>>>> 
>>>>> I noticed a peculiar behaviour in lumi package: when I apply the
>>>>> variance
>>>>> stabilizing transformation,
>>>>> it gives slightly different results each time I run the method. See
>>>>> below
>>>>> for a subset of the data:
>>>>> 
>>>>> 
>>>>>> load("dat.rda")
>>>>>> library("lumi")
>>>>> 
>>>>>> x1 <- lumiT(dat, method="vst", ifPlot=!TRUE)
>>>>> 2007-12-13 10:56:35 , processing array  1
>>>>> 2007-12-13 10:56:35 , processing array  2
>>>>> 2007-12-13 10:56:35 , processing array  3
>>>>> 2007-12-13 10:56:35 , processing array  4
>>>>> 
>>>>>> x2 <- lumiT(dat, method="vst", ifPlot=!TRUE)
>>>>> 2007-12-13 10:56:36 , processing array  1
>>>>> 2007-12-13 10:56:36 , processing array  2
>>>>> 2007-12-13 10:56:36 , processing array  3
>>>>> 2007-12-13 10:56:37 , processing array  4
>>>>> 
>>>>> 
>>>>>> table(exprs(x1)==exprs(x2))
>>>>> 
>>>>> FALSE  TRUE
>>>>> 88705     3
>>>>> 
>>>>>> range(exprs(x1)-exprs(x2))
>>>>> [1] -0.05682931  0.03592777
>>>>> 
>>>>>> sessionInfo()
>>>>> R version 2.7.0 Under development (unstable) (2007-11-29 r43558)
>>>>> i686-pc-linux-gnu
>>>>> 
>>>>> locale:
>>>>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF
>>>>> -8
>>>>> ;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N
>>>>> AM
>>>>> E=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI
>>>>> ON
>>>>> =C
>>>>> 
>>>>> attached base packages:
>>>>> [1] tools     stats     graphics  grDevices utils     datasets
>>>>> methods
>>>>> [8] base
>>>>> 
>>>>> other attached packages:
>>>>>  [1] lumi_1.5.10            annotate_1.15.6        AnnotationDbi_1.1.6
>>>>>  [4] RSQLite_0.6-0          DBI_0.2-3              mgcv_1.3-29
>>>>>  [7] affy_1.15.7            preprocessCore_0.99.12 affyio_1.5.7
>>>>> [10] Biobase_1.17.6
>>>>> 
>>>>> Cheers,
>>>>> Ligia
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Best wishes
>>>>    Wolfgang
>>>> 
>>>> ------------------------------------------------------------------
>>>> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>>>> 
>>> 
>>> 
>> 
>> 
>> ---------------------------------------------------
>> Pan Du, PhD
>> Research Assistant Professor
>> Robert H. Lurie Comprehensive Cancer Center
>> Northwestern University
>> 676 ST Clair St., #1200
>> Chicago, IL 60611
>> Office (312)695-4781
>> dupan at northwestern.edu
>> ---------------------------------------------------
>> 
>> 
>> 
>> 
>> 
> 
> 



More information about the Bioconductor mailing list