[BioC] how to get number of samples from GSE with GEOquery

Fri Dec 18 16:08:59 CET 2009

Hi Richard,

Thanks for your suggestion. More examples are good, even for myself. I
will try to add some in the vignette (depending on my time frame).

For the examples you mentioned, I tried a little bit like this:

###########

## 'hgu133plus2' is in the 'bioc_package' field of table 'gpl'
> dbListFields(con, "gpl")
 [1] "ID"                   "title"                "gpl"
 [4] "status"               "submission_date"      "last_update_date"
 [7] "technology"           "distribution"         "organism"
[10] "manufacturer"         "manufacture_protocol" "coating"
[13] "catalog_number"       "support"              "description"
[16] "web_link"             "contact"              "data_row_count"
[19] "supplementary_file"   "bioc_package"

> sqliteQuickSQL(con,"SELECT DISTINCT bioc_package FROM gpl")
...
12   hgu133plus2
...

> gpl_hgu133plus2 <- sqliteQuickSQL(con,"SELECT DISTINCT gpl from gpl where bioc_package ='hgu133plus2'")
    gpl
1 GPL570

## convert to gse
gse_conversion1 <-  geoConvert(gpl_hgu133plus2[[1]], 'gse')
gse_hgu133plus2 <- unique(gse_conversion1$gse$to_acc)

## It seems that the best field to find all 'cell lines' is the
'characteristics_ch1' field of the table 'gsm':
gsm_cell_line <- sqliteQuickSQL(con,"SELECT DISTINCT gsm FROM gsm
WHERE characteristics_ch1 LIKE '%cell%'  AND characteristics_ch1 LIKE
'%line%'")
## Convert to GSE
gse_conversion2 <-  geoConvert(gsm_cell_line[[1]], 'gse')
gse_cell_line <- unique(gse_conversion2$gse$to_acc)

## It seems that the best field in GSE to find all 'colon cancer' or
'colorectal cancer' is the 'summary' filed of the table 'gse':
gse_colon <-  sqliteQuickSQL(con,"SELECT DISTINCT gse from gse where
summary like '%colon cancer%'")
gse_colon <- gse_colon$gse

## 'all the colon cancer GSE objects that are primary cell lines that
use hgu133plus2 arrays.'
## intersection all three gse vectors: gse_hgu133plus2, gse_cell_line
and gse_colon

################

Hope this helps.  Please let me know if you have any questions or
suggestions.  Thanks.

Jack

On Thu, Dec 17, 2009 at 11:18 AM, Dick Beyer <dbeyer at u.washington.edu> wrote:
> Hi Sean and Jack,
>
> That's good to know.  My only immediate suggestion follows my questions.  It is always nice to have lots of worked out examples.
>
> I'm using GEOmetadb to find candidate GSE datasets I can then do further processing on.  If I can set up my sql queries correctly, I should be able to do that.  But I need to get the right fields from the GSE objects, etc.  Maybe your examples could be cast in that question/answer sort of way.  Such as, how do you find all the colon cancer GSE objects that are primary cell lines, or how do you find all the colorectal cancer GSE objects that use hgu133plus2 arrays.
>
> Thanks again,
> Dick
>
> *******************************************************************************
> Richard P. Beyer, Ph.D. University of Washington
> Tel.:(206) 616 7378     Env. & Occ. Health Sci. , Box 354695
> Fax: (206) 685 4696     4225 Roosevelt Way NE, # 100
>                        Seattle, WA 98105-6099
> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
> http://staff.washington.edu/~dbeyer
> *******************************************************************************
>
> On Thu, 17 Dec 2009, Sean Davis wrote:
>
>> On Thu, Dec 17, 2009 at 10:42 AM, Dick Beyer <dbeyer at u.washington.edu> wrote:
>>> Hi Sean,
>>>
>>> Well, I'm totally grateful to you for your work on this.  As usual,
>>> Bioconductor is making my life easier and more fun!
>>
>> Thanks.  The work is all Jack's, though.  If you have any comments or
>> suggestions on the software, let us know.
>>
>> Sean
>>
>
>
>