[BioC] Combining GEOquery with samr or siggenes
Sean Davis
sdavis2 at mail.nih.gov
Thu Sep 1 13:54:54 CEST 2011
On Thu, Sep 1, 2011 at 7:40 AM, Voke AO <ovokeraye at gmail.com> wrote:
> Hi,
>
> This would work if I had the exact sample sizes and knew what columns
> are cases or controls. I guess my question is more like...if I knew
> 100 GDS files had data related to a disease of interest, it would be a
> vicious process to go through each one and create a cl or cls file to
> correspond to the data. In such a situation, will it be possible to
> have a code that will
> 1. Go through the specified number of GDS files
> 2. Detect the different classes of samples and assign numbers
> accordingly for SAM/Siggenes for each GDS file, that will eventually
> be called back in the loop process for the SAM/Siggenes analysis of
> each file.
>
> I'm very much a beginner in programming so, I'm hoping I don't sound too naive.
Unfortunately, there will be some programming involved here. As for
the class info, the phenoData slot of an ExpressionSet resulting from
a call to GDS2eSet will be fully populated. Often, there will be a
standard column "disease.state" that you could use to create the cls
object.
> gds = getGEO("GDS10")
> eset = GDS2eSet(gds)
> pData(eset)$disease.state
[1] diabetic diabetic diabetic-resistant diabetic-resistant
[5] diabetic-resistant diabetic-resistant diabetic-resistant diabetic-resistant
[9] diabetic-resistant diabetic-resistant nondiabetic nondiabetic
[13] nondiabetic nondiabetic diabetic diabetic
[17] diabetic-resistant diabetic-resistant diabetic-resistant diabetic-resistant
[21] diabetic-resistant diabetic-resistant diabetic-resistant diabetic-resistant
[25] nondiabetic nondiabetic nondiabetic nondiabetic
Levels: diabetic diabetic-resistant nondiabetic
Hope that helps.
Sean
> Thanks again.
>
> ~V
>
> On Thu, Sep 1, 2011 at 12:50 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Thu, Sep 1, 2011 at 5:57 AM, Voke AO <ovokeraye at gmail.com> wrote:
>>> Hi,
>>>
>>> Is it possible to somehow have a code that can pull out several GDS
>>> info (like a batch process) using GEOquery in a way that they can
>>> subsequently be analyzed with SAM or Siggenes in a kind of loop?
>>
>> Yes. Here is a simple example. You will need to supply the code to
>> do any actual analysis and return the actual result, but I hope you
>> get the idea. I used sapply as the loop structure, but you could use
>> any loop structure that you like.
>>
>> Hope that helps.
>>
>> Sean
>>
>>> gdslist = c('GDS3717','GDS3718','GDS3719')
>>> analysisfunc = function(gdsid) {
>> gdsdat = getGEO(gdsid,destdir=".")
>> gdseset = GDS2eSet(gdsdat)
>> message("DO SIGGENES STUFF HERE")
>> return(sprintf("Results from %s would be here",gdsid))
>> }
>>> resultlist = sapply(gdslist,analysisfunc)
>> File stored at:
>> ./GDS3717.soft.gz
>> File stored at:
>> /var/folders/23/234W5ZnqHPih-U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL570.annot.gz
>> DO SIGGENES STUFF HERE
>> File stored at:
>> ./GDS3718.soft.gz
>> File stored at:
>> /var/folders/23/234W5ZnqHPih-U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL1261.annot.gz
>> DO SIGGENES STUFF HERE
>> File stored at:
>> ./GDS3719.soft.gz
>> File stored at:
>> /var/folders/23/234W5ZnqHPih-U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL1319.annot.gz
>> DO SIGGENES STUFF HERE
>> There were 50 or more warnings (use warnings() to see the first 50)
>>> resultlist
>> GDS3717 GDS3718
>> "Results from GDS3717 would be here" "Results from GDS3718 would be here"
>> GDS3719
>> "Results from GDS3719 would be here"
>>
>
More information about the Bioconductor
mailing list