# [BioC] About subsampling of VST in lumi

Pan Du dupan at northwestern.edu
Fri Dec 14 23:25:09 CET 2007

```Thanks! Ligia.
I will make the change. Probably, we will just remove the sub-sampling step
by default.

Have a nice weekend,

Pan

On 12/14/07 3:56 PM, "ligia at ebi.ac.uk" <ligia at ebi.ac.uk> wrote:

> Hi Pan,
>
> The problem I reported is not due to the downsampling step controlled via
> "nSupport" parameter, but with a subsequent step in "vst" where if the
> number of selected probes with high variance (indSel) is above 5000, then
> only a random subset (5000) of these probes is used (the steps I mentioned
> in my last email) to fit the linear model between variance and mean of
> probe beads. Couldn't this value (5000) be just another parameter to
> "vst"?
>
> Ligia
>
>
>
>> Hi Ligia,
>>
>> Yes, we use down-sampling to speed up the parameter estimation. If you
>> want
>> to use all the data points, you can set the parameter "nSupport" of vst
>> function as the length of the vector. I will add this to the vignette or
>> help file. Thanks!
>>
>>
>> Pan
>>
>>
>> On 12/14/07 5:18 AM, "ligia at ebi.ac.uk" <ligia at ebi.ac.uk> wrote:
>>
>>> Dear Pan Du,
>>>
>>>> From what I understand when looking at "vst", the random subsampling
>>>> that
>>> affects my data occurs at step 4 below:
>>>
>>> 1       if (c3 != 0) {
>>> 2            selInd <- selInd & (std^2 > c3)
>>> 3            dd <- data.frame(y = sqrt(std[selInd]^2 - c3), x1 =
>>> u[selInd])
>>> 4            if (nrow(dd) > 5000   dd <- dd[sample(1:nrow(dd), 5000), ]
>>> 5            lmm <- lm(y ~ x1, dd)
>>> 6            c1 <- lmm\$coef[2]
>>> 7            c2 <- lmm\$coef[1]
>>> 8        }
>>>
>>> because my "dd" matrix has around 5500 rows. Maybe it would be nice to
>>> have the option to turn this off, or add the option to provide the max
>>> value allowed for nrow(dd)...
>>>
>>> Cheers,
>>> Lígia
>>>
>>>
>>>> Dear Ligia
>>>>
>>>> I believe this is because they random subsample the data to "speed
>>>> processing", see the man page and the  nSupport parameter.
>>>>
>>>> I cc Pan Du with the suggestion to make the explanation of this in the
>>>> man page more clear. Is there an option to switch off the random
>>>> subsampling?
>>>>
>>>>   Best wishes
>>>> Wolfgang
>>>>
>>>>
>>>>
>>>> ligia at ebi.ac.uk ha scritto:
>>>>> Hi Wolfgang,
>>>>>
>>>>> I noticed a peculiar behaviour in lumi package: when I apply the
>>>>> variance
>>>>> stabilizing transformation,
>>>>> it gives slightly different results each time I run the method. See
>>>>> below
>>>>> for a subset of the data:
>>>>>
>>>>>
>>>>>> library("lumi")
>>>>>
>>>>>> x1 <- lumiT(dat, method="vst", ifPlot=!TRUE)
>>>>> 2007-12-13 10:56:35 , processing array  1
>>>>> 2007-12-13 10:56:35 , processing array  2
>>>>> 2007-12-13 10:56:35 , processing array  3
>>>>> 2007-12-13 10:56:35 , processing array  4
>>>>>
>>>>>> x2 <- lumiT(dat, method="vst", ifPlot=!TRUE)
>>>>> 2007-12-13 10:56:36 , processing array  1
>>>>> 2007-12-13 10:56:36 , processing array  2
>>>>> 2007-12-13 10:56:36 , processing array  3
>>>>> 2007-12-13 10:56:37 , processing array  4
>>>>>
>>>>>
>>>>>> table(exprs(x1)==exprs(x2))
>>>>>
>>>>> FALSE  TRUE
>>>>> 88705     3
>>>>>
>>>>>> range(exprs(x1)-exprs(x2))
>>>>> [1] -0.05682931  0.03592777
>>>>>
>>>>>> sessionInfo()
>>>>> R version 2.7.0 Under development (unstable) (2007-11-29 r43558)
>>>>> i686-pc-linux-gnu
>>>>>
>>>>> locale:
>>>>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF
>>>>> -8
>>>>> ;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N
>>>>> AM
>>>>> ON
>>>>> =C
>>>>>
>>>>> attached base packages:
>>>>> [1] tools     stats     graphics  grDevices utils     datasets
>>>>> methods
>>>>> [8] base
>>>>>
>>>>> other attached packages:
>>>>>  [1] lumi_1.5.10            annotate_1.15.6        AnnotationDbi_1.1.6
>>>>>  [4] RSQLite_0.6-0          DBI_0.2-3              mgcv_1.3-29
>>>>>  [7] affy_1.15.7            preprocessCore_0.99.12 affyio_1.5.7
>>>>> [10] Biobase_1.17.6
>>>>>
>>>>> Cheers,
>>>>> Ligia
>>>>
>>>>
>>>> --
>>>>
>>>> Best wishes
>>>>    Wolfgang
>>>>
>>>> ------------------------------------------------------------------
>>>> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------
>> Pan Du, PhD
>> Research Assistant Professor
>> Robert H. Lurie Comprehensive Cancer Center
>> Northwestern University
>> 676 ST Clair St., #1200
>> Chicago, IL 60611
>> Office (312)695-4781
>> dupan at northwestern.edu
>> ---------------------------------------------------
>>
>>
>>
>>
>>
>
>

```