[BioC] subset in XPS

Thu Jul 3 00:16:23 CEST 2008

Dear Zhibin

It is good to know that both methods worked for you.

Regarding your problem with MacOS 10.5 I assume that you are using R.app?

Please note that I do all my development on a MacBook Pro using MacOS 
10.4.8 and currently R-2.7.1, and command "data<-import.data()" is as 
fast as on Linux w/o any output problems. However, I never use R.app but 
always start R from an xterm!

I have just tested R.app and do not see a slowdown, however, I get some 
strange error messages.  Maybe there are even more problems with R.app 
on MacOS 10.5, which I currently do not have.

Since most of the output is from the C++ code, which can be used 
independently of R, I am not able to use Rprintf. I have tested my 
package on MacOS X, Linux and Winodws XP, and if you use the command 
line, everything works fine on all three machines.

I would appreciate if you could try to run your data on your Mac using 
either Apple's Terminal  or xterm (for xterm you need to install X11 
from the system CD first) and let me know if you still experience a 
slowdown.

Regarding your second question: Since you can use save.image() I did not 
yet implement to load an ExprTreeSet, however, this is on my to-do list.

Best regards
Christian

Zhibin Lu wrote:
> Dear Christian,
>
> I tried both methods and both of them worked well!
>
> Maybe you have known this problem. When I loaded CEL files under Mac OS 10.5/R 2.7.1/BioC 2.2 with the command
> Data=import.data(scheme, "Data", celdir=".", celfiles=files)
> it was very very slow and I also got a warning message "(WARNING: partial output only, ask package author to use Rprintf instead!)".
> But it was fine when I ran the same command under linux.
>
> RMA normalization costs lots of time. I know I can save the result using save.image() and use load() to continue the work next time, but just for curiosity, is there a way to load ExprTreeSet from root file just like load SchemeTreeSet and DataTreeSet?
>
> Thanks so much for your help,
>
> Zhibin
>
>   
>> Date: Mon, 30 Jun 2008 21:29:23 +0200
>> From: cstrato at aon.at
>> To: zhbluweb at hotmail.com
>> CC: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] subset in XPS
>>
>> Dear Zhibin
>>
>> Meanwhile, I have uploaded a new version to BioC devel:
>> http://bioconductor.org/packages/2.3/bioc/html/xps.html
>> which simplifies your request as follows:
>>
>> 1. get expression values
>>     
>>> value <- exprs(data.rma)
>>>       
>> 2. select treenames of choice (no extension necessary)
>>     
>>> treenames <- c("TestA2", "TestB1")
>>>       
>> 3. make a copy of your object if you do not want to replace it
>>     
>>> sub.rma <- data.rma
>>>       
>> 4. replace slot data with subset
>> exprs(sub.rma, treenames) <- value
>> 5. check if the new ExprTreeSet is correct:
>>     
>>> str(sub.rma)
>>>       
>> Best regards
>> Christian
>>
>>
>> Zhibin Lu wrote:
>>     
>>> Dear Christian,
>>>
>>> Thanks so much for such a detailed explanation. I will try this when I
>>> get to work next week, and I do not see why I can not follow the
>>> direction.
>>>
>>> Thanks again and have a nice weekend,
>>>
>>> Zhibin
>>>
>>>       
>>>> Date: Sat, 28 Jun 2008 15:46:26 +0200
>>>> From: cstrato at aon.at
>>>> To: zhbluweb at hotmail.com
>>>> CC: bioconductor at stat.math.ethz.ch
>>>> Subject: Re: [BioC] subset in XPS
>>>>
>>>> Dear Zhibin
>>>>
>>>> Since you have already done RMA you have now an ExprTreeSet,
>>>> called e.g. "data.rma". You can see the structure with:
>>>>         
>>>>> str(data.rma)
>>>>>           
>>>> Since currently there is no direct possibility to use a
>>>> subset of type ExprTreeSet only, you can create a new class
>>>> ExprTreeSet in the following way:
>>>>
>>>> 1. Make a subset of slot "data" which is a dataframe
>>>> (assuming that you want to use samples 1,2,3,7,8,9):
>>>>         
>>>>> subdata <- exprs(data.rma)
>>>>> subdata <- subdata[,c(1:2,3:5, 9:11)]
>>>>>           
>>>> Please note that it is important to keep the first
>>>> two columns.
>>>>
>>>> 2. Create a copy "sub.rma" of class "data.rma"
>>>>         
>>>>> sub.rma <- data.rma
>>>>>           
>>>> 3. Replace slot "data" with "subdata":
>>>>         
>>>>> exprs(sub.rma) <- subdata
>>>>>           
>>>> For the moment you need to replace slots "treenames" and
>>>> "numtrees", too, which I will change in the future to be
>>>> done automatically.
>>>>
>>>> 4. Replace slot "treenames" with the names of your subset:
>>>> a, create list containing the sub samples
>>>>         
>>>>> subtrees <- unlist(treeNames(data.g.rma))
>>>>> subtrees <- as.list(subtrees[c(1:3,7:9)])
>>>>>           
>>>> b, check if the names are correct:
>>>>         
>>>>> subtrees
>>>>>           
>>>> c, replace slot "treenames":
>>>>         
>>>>> sub.rma at treenames <- subtrees
>>>>>           
>>>> 5. Replace slot "numtrees" with the number of subsamples
>>>>         
>>>>> sub.rma at numtrees <- length(subtrees)
>>>>>           
>>>> 6. Check if the new ExprTreeSet is correct:
>>>>         
>>>>> str(sub.rma)
>>>>>           
>>>> Now you can use the new ExprTreeSet "sub.rma" as input for
>>>> method unifilter:
>>>>         
>>>>> rma.ufr <- unifilter(sub.rma, .......)
>>>>>           
>>>> If you want to take advantage of the advanced capabilties
>>>> of package "limma", then you can create a Biobase class
>>>> "ExpressionSet" containing only your 6 samples as described
>>>> in Appendix A.3 of the vignette xps.pdf:
>>>>
>>>> 1. extract the normalized expression data:
>>>>         
>>>>> subdata <- validData(data.rma)
>>>>>           
>>>> 2. Since "subdata" is a dataframe, simply create a subframe:
>>>>         
>>>>> subdata <- subdata[,c(1:3,7:9)]
>>>>>           
>>>> 3. Create a Biobase class "ExpressionSet", called "subset"
>>>>         
>>>>> subset <- new("ExpressionSet", exprs = as.matrix(subdata))
>>>>>           
>>>> Now you have an ExpressionSet ready for use with "limma".
>>>>
>>>> Please let me know if you succeeded with this info.
>>>>
>>>> Best regards
>>>> Christian
>>>> _._._._._._._._._._._._._._._._
>>>> C.h.i.s.t.i.a.n S.t.r.a.t.o.w.a
>>>> V.i.e.n.n.a A.u.s.t.r.i.a
>>>> e.m.a.i.l: cstrato at aon.at
>>>> _._._._._._._._._._._._._._._._
>>>>
>>>> Zhibin Lu wrote:
>>>>         
>>>>> Hi,
>>>>>
>>>>> I am new in R/bioconductor. I am using xps package to analyze
>>>>>           
>>> Affymetrix Gene ST 1.0 data. After I loaded CEL files into the
>>> DataTreeSet and compute the expression level with RMA, can I work on a
>>> subset of the data? Say, I have 12 samples. After RMA, can I just work
>>> on 6 of them and divide them into two groups and apply UniFilter to
>>> just these 6 ones?
>>>       
>>>>> Thanks,
>>>>>
>>>>> Zhibin
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>>           
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>       
>>>>>
>>>>>           
>>> ------------------------------------------------------------------------
>>>       
>
> _________________________________________________________________
> Try Chicktionary, a game that tests how many words you can form from the letters given. Find this and more puzzles at Live Search Games!
> http://g.msn.ca/ca55/207
>
>