[Rd] encountering difficulty asking R to manipulate the correct columns in Expression Set class (object 4). (PR#13464)

Thu Jan 22 15:37:34 CET 2009

Hi Guy --

As noted, this should be sent to the Bioconductor mailing list, see
http://bioconductor.org/docs/mailList.html.

Some more comments below...

guy.tillinghast at rivhs.com writes:

> Full_Name: Guy W. Tillinghast
> Version: 2.8.0
> OS: Windows XP professional
> Submission from: (NULL) (24.248.24.3)
>
>
> I am encountering difficulty asking R to manipulate the correct columns in
> Expression Set class (object 4).
>
> I download the ALL data with:
> library(golubEsets)
> data(Golub_Merge)
>
> Note, the data has the samples not in order.  This is not R's fault (at least
> not that I can tell):
>> Golub_Merge$Samples
>  [1] 39 40 42 47 48 49 41 43 44 45 46 70 71 72 68 69 67 55 56 59 52 53 51 50 54
> [26] 57 58 60 61 65 66 63 64 62  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
> [51] 17 18 19 20 21 22 23 24 25 26 27 34 35 36 37 38 28 29 30 31 32 33

'Samples' is a covariate, not an index into the ExpressionSet. It is
like any of the other 11 covariates in phenoData (try
pData(Golub_Merge) to get a data frame of all covariates).

> I want a subset:
>> learning.set<-c(1,2,3,6,8,10,11,12,13,15,16,17,18,19,20,21,23,24,26,30,31,32,34,35,36,37,39,43,44,45,46,47,48,49,50,51,52,53,55,56,57,58,59,60,62,63,64,65,66,67,68,69,70,72)
>> learningEset<-Golub_Merge[,learning.set]
>> learningEset$Samples
>  [1] 39 40 42 49 43 45 46 70 71 68 69 67 55 56 59 52 51 50 57 65 66 63 62  1  2
> [26]  3  5  9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 34 35 36 37 38 28
> [51] 29 30 31 33

This selects columns 1, 2, 3 etc of the ExpressionSet, and the
corresponding phenoData. The column 1 corresponds to Sample 39, so you
select that Sample, and so on. If you want individuals with particular
values of the Samples column of phenoData, you might say something
like

Golub_Merge[, Golub_Merge$Samples %in% learning.set]

just as you might select all the male samples with

Golub_Merge[, Golub_Merge$Gender == "M"]

Kind of a powerful idiom.

> Note what happened: 
> 1)	the order is difference than learning.set
> 2)	samples have been switched: example: sample 72 out, sample 71 in. 
>
> Okay, I troubleshoot: maybe it matters what order I request samples:
>
>> learning.set<-c(39,47,48,49,43,44,45,46,70,72,68,69,67,55,56,59,52,53,51,50,57,58,60,65,66,63,64,62,1,2,3,6,8,10,11,12,13,15,16,17,18,19,20,21,23,24,26,34,35,36,37,30,31,32)
>> learningEset<-Golub_Merge[,learning.set]
>> learningEset$Samples
>  [1]  5 13 14 15  9 10 11 12 31 33 29 30 28 21 22 25 18 19 17 16 23 24 26 37 38
> [26] 35 36 34 39 40 42 49 43 45 46 70 71 68 69 67 55 56 59 52 51 50 57 62  1  2
> [51]  3 65 66 63
> Frankly, this is troubling that R did not do what it was told.

All Bioconductor packages have vignettes, which is a good place to
start to understand a package. An ExpressionSet is defined in Biobase,
so visit (from the front page of the Bioconductor site, following the
software link)
http://bioconductor.org/packages/2.3/bioc/html/Biobase.html

and read the ExpressionSetIntroduction.pdf as a starting point. There
is also extensive training material (under the 'workshops' tab at the
top of the page) and some excellent books.

The bioconductor mailing list and mailing list archives are also very
helpful places.

Martin

> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793