[BioC] About subsampling of VST in lumi
Pan Du
dupan at northwestern.edu
Fri Dec 14 23:25:09 CET 2007
Thanks! Ligia.
I will make the change. Probably, we will just remove the sub-sampling step
by default.
Have a nice weekend,
Pan
On 12/14/07 3:56 PM, "ligia at ebi.ac.uk" <ligia at ebi.ac.uk> wrote:
> Hi Pan,
>
> Thanks for your email.
> The problem I reported is not due to the downsampling step controlled via
> "nSupport" parameter, but with a subsequent step in "vst" where if the
> number of selected probes with high variance (indSel) is above 5000, then
> only a random subset (5000) of these probes is used (the steps I mentioned
> in my last email) to fit the linear model between variance and mean of
> probe beads. Couldn't this value (5000) be just another parameter to
> "vst"?
>
> Thanks for your help,
> Ligia
>
>
>
>> Hi Ligia,
>>
>> Thanks for your report.
>> Yes, we use down-sampling to speed up the parameter estimation. If you
>> want
>> to use all the data points, you can set the parameter "nSupport" of vst
>> function as the length of the vector. I will add this to the vignette or
>> help file. Thanks!
>>
>>
>> Pan
>>
>>
>> On 12/14/07 5:18 AM, "ligia at ebi.ac.uk" <ligia at ebi.ac.uk> wrote:
>>
>>> Dear Pan Du,
>>>
>>>> From what I understand when looking at "vst", the random subsampling
>>>> that
>>> affects my data occurs at step 4 below:
>>>
>>> 1 if (c3 != 0) {
>>> 2 selInd <- selInd & (std^2 > c3)
>>> 3 dd <- data.frame(y = sqrt(std[selInd]^2 - c3), x1 =
>>> u[selInd])
>>> 4 if (nrow(dd) > 5000 dd <- dd[sample(1:nrow(dd), 5000), ]
>>> 5 lmm <- lm(y ~ x1, dd)
>>> 6 c1 <- lmm$coef[2]
>>> 7 c2 <- lmm$coef[1]
>>> 8 }
>>>
>>> because my "dd" matrix has around 5500 rows. Maybe it would be nice to
>>> have the option to turn this off, or add the option to provide the max
>>> value allowed for nrow(dd)...
>>>
>>> Cheers,
>>> Lígia
>>>
>>>
>>>> Dear Ligia
>>>>
>>>> I believe this is because they random subsample the data to "speed
>>>> processing", see the man page and the nSupport parameter.
>>>>
>>>> I cc Pan Du with the suggestion to make the explanation of this in the
>>>> man page more clear. Is there an option to switch off the random
>>>> subsampling?
>>>>
>>>> Best wishes
>>>> Wolfgang
>>>>
>>>>
>>>>
>>>> ligia at ebi.ac.uk ha scritto:
>>>>> Hi Wolfgang,
>>>>>
>>>>> I noticed a peculiar behaviour in lumi package: when I apply the
>>>>> variance
>>>>> stabilizing transformation,
>>>>> it gives slightly different results each time I run the method. See
>>>>> below
>>>>> for a subset of the data:
>>>>>
>>>>>
>>>>>> load("dat.rda")
>>>>>> library("lumi")
>>>>>
>>>>>> x1 <- lumiT(dat, method="vst", ifPlot=!TRUE)
>>>>> 2007-12-13 10:56:35 , processing array 1
>>>>> 2007-12-13 10:56:35 , processing array 2
>>>>> 2007-12-13 10:56:35 , processing array 3
>>>>> 2007-12-13 10:56:35 , processing array 4
>>>>>
>>>>>> x2 <- lumiT(dat, method="vst", ifPlot=!TRUE)
>>>>> 2007-12-13 10:56:36 , processing array 1
>>>>> 2007-12-13 10:56:36 , processing array 2
>>>>> 2007-12-13 10:56:36 , processing array 3
>>>>> 2007-12-13 10:56:37 , processing array 4
>>>>>
>>>>>
>>>>>> table(exprs(x1)==exprs(x2))
>>>>>
>>>>> FALSE TRUE
>>>>> 88705 3
>>>>>
>>>>>> range(exprs(x1)-exprs(x2))
>>>>> [1] -0.05682931 0.03592777
>>>>>
>>>>>> sessionInfo()
>>>>> R version 2.7.0 Under development (unstable) (2007-11-29 r43558)
>>>>> i686-pc-linux-gnu
>>>>>
>>>>> locale:
>>>>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF
>>>>> -8
>>>>> ;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N
>>>>> AM
>>>>> E=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI
>>>>> ON
>>>>> =C
>>>>>
>>>>> attached base packages:
>>>>> [1] tools stats graphics grDevices utils datasets
>>>>> methods
>>>>> [8] base
>>>>>
>>>>> other attached packages:
>>>>> [1] lumi_1.5.10 annotate_1.15.6 AnnotationDbi_1.1.6
>>>>> [4] RSQLite_0.6-0 DBI_0.2-3 mgcv_1.3-29
>>>>> [7] affy_1.15.7 preprocessCore_0.99.12 affyio_1.5.7
>>>>> [10] Biobase_1.17.6
>>>>>
>>>>> Cheers,
>>>>> Ligia
>>>>
>>>>
>>>> --
>>>>
>>>> Best wishes
>>>> Wolfgang
>>>>
>>>> ------------------------------------------------------------------
>>>> Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
>>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------
>> Pan Du, PhD
>> Research Assistant Professor
>> Robert H. Lurie Comprehensive Cancer Center
>> Northwestern University
>> 676 ST Clair St., #1200
>> Chicago, IL 60611
>> Office (312)695-4781
>> dupan at northwestern.edu
>> ---------------------------------------------------
>>
>>
>>
>>
>>
>
>
More information about the Bioconductor
mailing list