[BioC] Loading quantiles normalized root data in XPS

Tue Feb 7 21:55:18 CET 2012

Since I did not implement operators [] for function intensity() you need 
to use a dataframe to set values to zero and then use intensity()<-value 
to replace the object. However, please read the help file for 
intensity<- carfully, since this operation is dangerous (and I have not 
tested it for exon arrays).
I think it is best to work with the dataframe itself, since you want to 
fit your own model, anyhow.

Christian

On 2/7/12 2:17 PM, Paul Geeleher wrote:
> Actually doing what I'm talking about relies on being able to modify
> values in "intensity(data_qn)", which I was assuming one could do but
> testing it I don't think I can!?
>
>> head(intensity(data_qn))
>    X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
> 1 0 0             0.0000              0.000
> 2 1 0             0.0000              0.000
> 3 2 0             0.0000              0.000
> 4 3 0             0.0000              0.000
> 5 4 0             0.0000              0.000
> 6 5 0            85.3523            121.733
>
>> intensity(data_qn)[6,3]
> [1] 85.3523
>> intensity(data_qn)[6,3]<- 0
> Error in dimnames(x) : 'x' is missing
>
>> head(intensity(data_qn))
>    X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
> 1 0 0             0.0000              0.000
> 2 1 0             0.0000              0.000
> 3 2 0             0.0000              0.000
> 4 3 0             0.0000              0.000
> 5 4 0             0.0000              0.000
> 6 5 0            85.3523            121.733
>
> This means a different approach may be necessary.
>
> What I can do is extract the probe level estimates for each gene,
> which I think you outlined in a previous email to the list
> (https://mailman.stat.ethz.ch/pipermail/bioconductor/2011-September/040963.html)
> and from there set those probes which aren't detected above background
> to zero (once I've figured out how to map probesets to probes). I can
> the do what I need to do with this data (which is basically to fit a
> model based on individual probe intensities for each gene) and do this
> for every gene, which won't involve modifying the intensities in
> data_qn and as I'm only "attaching" probe level data for one gene at a
> time, shouldn't use that much memory.
>
> Paul
>
> On Tue, Feb 7, 2012 at 12:52 PM, Paul Geeleher<paulgeeleher at gmail.com>  wrote:
>> Hi Christian, apologies for the lack of clarity in my previous email,
>> I'll try to clear this up, see replies below:
>>
>> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at aon.at>  wrote:
>>> Dear Paul,
>>>
>>> I am afraid that it is not quite clear to me what you want to do.
>>>
>>> What do you mean with "obtain the expression levels of the probes"? Do you
>>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your question
>>> about DABG?
>>
>> I've read in and quantiles normalized probe level data for 176 arrays
>> and what I'm trying to do now is set every probe (individual probe,
>> not probeset) that is not detected above background to zero. But from
>> what I've read dabg.call() can only create p-values for probeset
>> level? So as compromise I'm now trying to set the expression of all
>> *individual probes* which belong to a *probeset* not detected above
>> background to zero. I hope that makes sense.
>>
>>>
>>> How did you create "data_hapMap"? Maybe you could send me your complete code
>>> otherwise I am only able to guess.
>>
>> "data_hapMap" was a mistake, sorry I copied the wrong object name it
>> should have been "data_qn" which is quantiles normalized probe level
>> data that I created like this:
>>
>> data_hapMap<- root.data(scheme.huex10stv2,
>> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.root")
>> #read raw data
>> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles",
>> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe*
>> level
>>
>> So "data_qn" is the object I want to work with, based on your advice
>> I've figured out that to access the expression levels in "data_qn" I
>> need to use "attachInten()", then I can use "intensities()" to access
>> the expression levels:
>>
>> treenames<- unlist(treeNames(data_qn))
>> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach the
>> first two samples
>>
>>> head(intensity(data_qn))
>>   X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
>> 1 0 0             0.0000              0.000
>> 2 1 0             0.0000              0.000
>> 3 2 0             0.0000              0.000
>> 4 3 0             0.0000              0.000
>> 5 4 0             0.0000              0.000
>> 6 5 0            85.3523            121.733
>>
>> Does the X column in the output above represent the probe ID? If so
>> and I have a mapping of probeset IDs to corresponding probe IDs, it
>> should be fairly straightforward to set probes that are not detected
>> above background to zero? Perhaps there is a more straightforward way
>> of doing this though?
>>
>>
>>>
>>> I guess that "data_hapMap" contains the raw data. For these the slot "data"
>>> is empty to save memory. So you need to use either attachData() or
>>> attachInten(). However since you are using exon arrays you may not have
>>> enough RAM, so it would be better to use function export() or export.data(),
>>> or attach only a subset, see help ?attachData. See also vignette xps.pdf
>>> (chapter 2.3).
>>
>> I think RAM shouldn't be an issue if I attach the samples one at a
>> time? (I actually have access to a machine with 32gigs RAM but ideally
>> would like to get what I'm doing to run on a standard desktop, which
>> is actually why I'm using XPS!).
>>
>>>
>>> When you talk about "expression matrix", how did you create it? Maybe you
>>> could use function validExpr(), but w/o seeing your code it is hard to tell.
>>> For DABG there are functions pvalData() and presCall(), see the examples in
>>> help ?dabg.call.
>>
>> Yes I've managed to use dabg.call() at probeset level and access the
>> p-values using pvalData() alright.
>>
>> Thanks again for all of your help and patience!
>>
>> Kind Regards,
>>
>> Paul.
>>
>>
>>>
>>> Best regards
>>> Christian
>>>
>>>
>>>
>>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>>
>>>> Hi Christian,
>>>>
>>>> Thanks for your quick and informative reply.
>>>>
>>>> I have re-run the analysis and saved the R objects as you suggest. The
>>>> next thing I'm trying to do is to obtain the expression levels of the
>>>> probes, but this doesn't seem to be working for me:
>>>>
>>>>> a<- validData(data_hapMap)
>>>>
>>>> Error in .local(object, ...) : slot "data" has no data
>>>>
>>>> Based on the documentation I think validData() is the correct function.
>>>>
>>>> I've also performed probeset level DABG and I'm trying to set
>>>> individual probes which belong to probesets with DABG<    .05 to 0 in
>>>> the expression matrix.
>>>>
>>>> But it seems I can't see the expression matrix using validData().
>>>> Perhaps there is another function. Any ideas?
>>>>
>>>> Thank you again for your help with this, I'm very grateful!
>>>>
>>>> Paul.
>>>>
>>>> On 2/2/12, cstrato<cstrato at aon.at>    wrote:
>>>>>
>>>>> Dear Paul,
>>>>>
>>>>> The functions root.data(), root.call() and root.expr() were created to
>>>>> allow you access to the corresponding root files just in case that you
>>>>> did not save your R session.
>>>>>
>>>>> In the cases where you compute expression levels stepwise, or only part
>>>>> of them such as normalize.quantiles(), as seems to be the matter in your
>>>>> case, there is no corresponding root.xxx() function to access the root
>>>>> file directly. In these cases you need to save your R session to have
>>>>> continued access to the resulting root file.
>>>>>
>>>>> Please note that saving the R session is the usual case to have access
>>>>> to the root files.
>>>>>
>>>>> Best regards
>>>>> Christian
>>>>>
>>>>>
>>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>>
>>>>>> Hi Christian,
>>>>>>
>>>>>> Thanks for your quick reply. I check what kind of trees I have using
>>>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu"
>>>>>> rather than "int", this is presumably because my analysis required no
>>>>>> background correction step?
>>>>>>
>>>>>> So I then tried:
>>>>>>>
>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu")
>>>>>>
>>>>>>
>>>>>> but that gives me a huge number of errors that look like this:
>>>>>>
>>>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>>>> Error: Could not get directory<PreprocesSet>.
>>>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>>>> Error: Could not get directory<PreprocesSet>.
>>>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>>>> Error: Could not get directory<PreprocesSet>.
>>>>>> Error: Could not get tree<ExportSet>.
>>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root",  :
>>>>>>     error in function ‘ExportData’
>>>>>>
>>>>>>
>>>>>> This file "exon_quantiles.root" definitely exists in the current
>>>>>> working directory though... Thanks again for your help!
>>>>>>
>>>>>> Paul.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at aon.at>     wrote:
>>>>>>>
>>>>>>> Dear Paul,
>>>>>>>
>>>>>>> Please have a look at the help ?root.expr.
>>>>>>>
>>>>>>> If I understand you correctly, you did only do quantile normalization?
>>>>>>>
>>>>>>> To see the tree names in your file you should do:
>>>>>>>>
>>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>>
>>>>>>>
>>>>>>> You will probably see trees with extension "int", see help
>>>>>>> ?validTreetype.
>>>>>>>
>>>>>>> To load these trees you need to do:
>>>>>>>>
>>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int")
>>>>>>>
>>>>>>>
>>>>>>> Please let me know if this did solve your problem.
>>>>>>>
>>>>>>> Best regards
>>>>>>> Christian
>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>>>>>>> V.i.e.n.n.a           A.u.s.t.r.i.a
>>>>>>> e.m.a.i.l:        cstrato at aon.at
>>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon
>>>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", but
>>>>>>>> if I try to load it the same was I'd load my raw data (using the
>>>>>>>> scheme file I created for Affy exon arrays) I get the error below? I
>>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need a
>>>>>>>> different "root scheme" file for this normalized data? Unfortunately,
>>>>>>>> I haven't been able to find an answer.
>>>>>>>>
>>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>>>>>>
>>>>>>>>
>>>>>>>> Error in if (chipname != treetitle) { : argument is of length zero
>>>>>>>>
>>>>>>>> Hope someone can help,
>>>>>>>>
>>>>>>>> Paul.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> sessionInfo()
>>>>>>>>
>>>>>>>>
>>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>>
>>>>>>>> locale:
>>>>>>>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>>    [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>>>    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>
>>>>>>>> attached base packages:
>>>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Paul Geeleher (PhD Student)
>> School of Mathematics, Statistics and Applied Mathematics
>> National University of Ireland
>> Galway
>> Ireland
>> --
>> www.bioinformaticstutorials.com
>
>
>