[BioC] getGEO and wilcox.test

Ovokeraye Achinike-Oduaran ovokeraye at gmail.com
Tue Mar 20 14:03:28 CET 2012


Thanks Sean.

I already used limma for my analyses. I was just trying to repeat the
data analysis used in the original paper (GSE121). But I have an idea
on how to proceed now.

Thanks again.

-Avoks

On Tue, Mar 20, 2012 at 2:46 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>
> On Tue, Mar 20, 2012 at 8:37 AM, Ovokeraye Achinike-Oduaran
> <ovokeraye at gmail.com> wrote:
>>
>> Hi,
>>
>> Sorry about the vagueness.
>>
>> This is how I have retrieved my data from GEO. I'm trying to see the
>> DE of the genes across the two conditions (IR and IS). I just couldn't
>> figure out how to apply this info to wilcox.test()
>>
>> gds157dat = getGEO('GDS157',destdir=".")
>> gds157eset = GDS2eSet(gds157dat, do.log2=TRUE)
>> groups= pData(gds157eset)$metabolism
>> groups=as.character(groups)
>> groups[groups=="insulin sensitive"]= "IS"
>> groups[groups=="insulin resistant"]= "IR"
>>
>
> First, wilcox.test works on a gene/probe at a time, so you'll need some type
> of looping structure (apply, for example).  Second, you'll need to split
> your data into two vectors corresponding to the IR and IS subsets; these two
> vectors will be the x and y variables in wilcox.test.
>
> You might also look at the multtest package and consider using limma.
>  Particularly since your data contain only 10 samples, rank-based methods
> are going to be of limited use.
>
> Sean
>
>
>
>
>>
>> sessionInfo()
>> R version 2.14.1 (2011-12-22)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_.1252  LC_CTYPE=English_.1252
>> [3] LC_MONETARY=English_.1252 LC_NUMERIC=C
>> [5] LC_TIME=English_.1252
>>
>> attached base packages:
>> [1] stats4    splines   stats     graphics  grDevices utils     datasets
>> [8] methods   base
>>
>> other attached packages:
>>  [1] coin_1.0-21         modeltools_0.2-19   mvtnorm_0.9-9992
>>  [4] survival_2.36-12    XML_3.9-4.1         RCurl_1.91-1.1
>>  [7] bitops_1.0-4.1      puma_2.6.0          mclust_3.4.11
>> [10] limma_3.10.2        ArrayExpress_1.14.0 affy_1.32.1
>> [13] GEOquery_2.20.8     Biobase_2.14.0
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.22.0         BiocInstaller_1.2.1   preprocessCore_1.16.0
>> [4] zlibbioc_1.0.0
>> >
>>
>> Regards,
>>
>> Avoks
>>
>> On Tue, Mar 20, 2012 at 2:15 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> >
>> >
>> > On Tue, Mar 20, 2012 at 7:56 AM, Vincent Carey
>> > <stvjc at channing.harvard.edu>
>> > wrote:
>> >>
>> >> Please read the posting guide
>> >> http://www.bioconductor.org/help/mailing-list/posting-guide/ before
>>
>> >> querying this list.
>> >>
>> >> You have not given any information on how you have used getGEO.  To
>> >> help
>> >> you, I issued
>> >>
>> >> > library(GEOquery)
>> >> Setting options('download.file.method.GEOquery'='auto')
>>
>> >> > gg = getGEO("GDS157")
>> >> File stored at:
>> >>
>> >>
>> >> /var/folders/4D/4DI98FkjGzq0K2niUTEHSE+++TM/-Tmp-//RtmpGnz9Cf/GDS157.soft.gz
>> >> > gg
>> >> An object of class "GDS"
>> >
>> >
>> > At this point, if you would like to work with an ExpressionSet instead
>> > of a
>> > GDS object, try:
>> >
>> > expset = GDS2eSet(gg)
>> >
>> > Sean
>> >
>> >>
>> >> channel_count
>> >> [1] "1"
>> >> dataset_id
>> >> [1] "GDS157" "GDS157"
>> >> description
>> >> [1] "Analysis of gene expression in pooled vastus lateralis muscle
>> >> samples
>> >> from insulin-sensitive and insulin-resistant equally obese,
>> >> non-diabetic
>> >> Pima Indians. A search for susceptibility genes for type 2 diabetes.
>> >>  "
>> >> ...
>> >>
>> >> > getClass("GDS")
>> >> Class "GDS" [package "GEOquery"]
>> >>
>> >> Slots:
>> >>
>> >> Name:           gpl    dataTable       header
>> >> Class:          GPL GEODataTable         list
>> >>
>> >> Extends: "GEOData"
>> >> > getClass("GEODataTable")
>> >> Class "GEODataTable" [package "GEOquery"]
>> >>
>> >> Slots:
>> >>
>> >> Name:     columns      table
>> >> Class: data.frame data.frame
>> >>
>> >> Here I am using R's self-describing capacities to learn about what the
>> >> query returned.
>> >>
>> >> > gg at dataTable@columns
>> >>    sample        metabolism
>> >> 1  GSM2289 insulin resistant
>> >> 2  GSM2294 insulin resistant
>> >> 3  GSM2299 insulin resistant
>> >> 4  GSM2304 insulin resistant
>> >> 5  GSM2309 insulin resistant
>> >> 6  GSM2313 insulin sensitive
>> >> 7  GSM2318 insulin sensitive
>> >> 8  GSM2323 insulin sensitive
>> >> 9  GSM2328 insulin sensitive
>> >> 10 GSM2333 insulin sensitive
>> >>
>> >> description
>> >> 1  Value for GSM2289: insulin resistant sample pool 1 muscle on HuFL;
>> >> src:
>> >> muscle
>> >> 2  Value for GSM2294: insulin resistant sample pool 2 muscle on HuFL;
>> >> src:
>> >> muscle
>> >> 3  Value for GSM2299: insulin resistant sample pool 3 muscle on HuFL;
>> >> src:
>> >> muscle
>> >> 4  Value for GSM2304: insulin resistant sample pool 4 muscle on HuFL;
>> >> src:
>> >> muscle
>> >> 5  Value for GSM2309: insulin resistant sample pool 5 muscle on HuFL;
>> >> src:
>> >> muscle
>> >> 6  Value for GSM2313: insulin sensitive sample pool 1 muscle on HuFL;
>> >> src:
>> >> muscle
>> >> 7  Value for GSM2318: insulin sensitive sample pool 2 muscle on HuFL;
>> >> src:
>> >> muscle
>> >> 8  Value for GSM2323: insulin sensitive sample pool 3 muscle on HuFL;
>> >> src:
>> >> muscle
>> >> 9  Value for GSM2328: insulin sensitive sample pool 4 muscle on HuFL;
>> >> src:
>> >> muscle
>> >> 10 Value for GSM2333: insulin sensitive sample pool 5 muscle on HuFL;
>> >> src:
>> >> muscle
>> >>
>> >> Now I start to see that the collection of samples may be viewed as
>> >> falling
>> >> into two classes.  If you want to use wilcox.test to address a
>> >> two-sample
>> >> problem arising from this experiment, you will have to use the
>> >> information
>> >> shown above to distinguish numerical values on gene expression into the
>> >> classes.  There is more than enough information in the above to begin
>> >> this
>> >> process; for biological interpretation you need to know a little more:
>> >> you
>> >> will need to know the GPL80 is documented in the package hu6800.db.
>> >>
>> >> On Tue, Mar 20, 2012 at 7:24 AM, Ovokeraye Achinike-Oduaran <
>> >> ovokeraye at gmail.com> wrote:
>> >>
>> >> > Hi all,
>> >> >
>> >> > I am not quite sure how to use the expression set I get from
>> >> > getGEO(),
>> >> > say gds157, in wilcox.test().
>> >> >
>> >> > Please help.
>> >> >
>> >> > Thanks.
>> >> >
>> >> > Avoks
>> >> >
>> >> > _______________________________________________
>> >> > Bioconductor mailing list
>> >> > Bioconductor at r-project.org
>> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> > Search the archives:
>> >> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >> >
>> >>
>> >>        [[alternative HTML version deleted]]
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at r-project.org
>>
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list