[R] A problem subsetting a data frame
David Winsemius
dwinsemius at comcast.net
Tue Nov 27 07:50:51 CET 2012
On Nov 26, 2012, at 11:10 PM, David Winsemius wrote:
>
> On Nov 26, 2012, at 3:05 PM, Aki Hoji wrote:
>
>> Hi all,
>>
>> I have this microarray large microarray data set (ALL) from which
>> I would like to subset or extract a set of data based on a factor
>> ($mol.biol). I looked up some example of subsetting in, picked
>> up two commands and tried both but I got error messages as follows
>>
>>> testset <- subset(ALL, ALL$mol.biol %in% c("BCR/ABL","ALL1/AF4"))
>>
>>>> Error in c("BCR/ABL", "ALL1/AF4") : unused argument(s) ("ALL1/AF4")
>>
>>
>>> testset <- ALL[ALL$mol.biol %in% c("BCR/ABL,NEG"), ]
>>>> Error in ALL[ALL$mol.biol %in% c(BCR/ABL, NEG), ] :
>
> Looking done below you see mostly "@" signs, not the "$" signs that
> you would expect to see if you were hoping to use the "$" function.
> You need to learn to deal with S4 objects. Does an ExpressionSet-
> object have extractor functions? Is `subset` one of those?
>
> I'm guessing there might be a more approved way of extracting the
> 'mol.biol' component of the 'data' dataframe of the 'phenoData'
> component, but this would be the hackish way of approaching it:
>
> ALLpdat <- ALL@ phenoData at data
>
> ALLpdat[ ALLpdat$mol.biol %in% c("BCR/ABL"), ]
>
> I picked a factor level that I could tell should work based on the
> str output below. If you wanted to see the entire set of legal
> factor levels you would type:
>
> levels( ALLpdat$mol.biol)
>
> I really think you need to do quite a bit more self-study since you
> seem to not understand some fairly basic issues about Bioconductor
> sorts of object which are often S4 Formal Classes. You should
> probably get your hands on some vignettes that use these sorts of
> data structures.
>
Following that advice one finds many such tutorials with the Google
search: { ALL expressionset } and there _is_ an extractor function for
that component of an ExpressionSet object.
See: http://bcb.dfci.harvard.edu/~aedin/courses/BiocDec2011/Slides2.ppt
There is (or was) a tutorial at your own institution:
www.biostat.pitt.edu/biost2055/11/110202_W5_Lab2.doc
(But it will not load with my browser.)
Notice the similarity of the output to the 'data' portion' (after
one installs and loads the ALL package which would have been courteous
of you to have mentioned).
str(phenoData(ALL))
Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
..@ varMetadata :'data.frame': 21 obs. of 1 variable:
.. ..$ labelDescription: chr [1:21] " Patient ID" " Date of
diagnosis" " Gender of the patient" " Age of the patient at entry" ...
..@ data :'data.frame': 128 obs. of 21 variables:
.. ..$ cod : chr [1:128] "1005" "1010" "3002" "4006" ...
.. ..$ diagnosis : chr [1:128] "5/21/1997" "3/29/2000"
"6/24/1998" "7/17/1997" ...
.. ..$ sex : Factor w/ 2 levels "F","M": 2 2 1 2 2 2 1 2
2 2 ...
.. ..$ age : int [1:128] 53 19 52 38 57 17 18 16 15 40 ...
.. ..$ BT : Factor w/ 10 levels "B","B1","B2",..: 3 3 5
2 3 2 2 2 3 3 ...
.. ..$ remission : Factor w/ 2 levels "CR","REF": 1 1 1 1 1 1 1
1 1 1 ...
.. ..$ CR : chr [1:128] "CR" "CR" "CR" "CR" ...
.. ..$ date.cr : chr [1:128] "8/6/1997" "6/27/2000"
"8/17/1998" "9/8/1997" ...
.. ..$ t(4;11) : logi [1:128] FALSE FALSE NA TRUE FALSE
FALSE ...
.. ..$ t(9;22) : logi [1:128] TRUE FALSE NA FALSE FALSE
FALSE ...
.. ..$ cyto.normal : logi [1:128] FALSE FALSE NA FALSE FALSE
FALSE ...
--------snipped further output-----------
So do some searching and self-study.
> --
> David.
>
>>>
>>>> error in evaluating the argument 'i' in selecting a method for
>>>> function '[': Error in c(BCR/ABL, NEG) : unused argument(s) (NEG)
>>
>> At this point, I really appreciate any inputs to move forward. ….
>>
>>> str(ALL)
>>> Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
>>> ..@ experimentData :Formal class 'MIAME' [package "Biobase"]
>>> with 13 slots
>>> .. .. ..@ name : chr "Chiaretti et al."
>>> .. .. ..@ lab : chr "Department of Medical Oncology,
>>> Dana-Farber Cancer Institute, Department of Medicine, Brigham and
>>> Women's Hospital, Harvard Med"| __truncated__
>>> .. .. ..@ contact : chr ""
>>> .. .. ..@ title : chr "Gene expression profile of adult
>>> T-cell acute lymphocytic leukemia identifies distinct subsets of
>>> patients with different respo"| __truncated__
>>> .. .. ..@ abstract : chr "Gene expression profiles were
>>> examined in 33 adult patients with T-cell acute lymphocytic
>>> leukemia (T-ALL). Nonspecific filteri"| __truncated__
>>> .. .. ..@ url : chr ""
>>> .. .. ..@ pubMedIds : chr [1:2] "14684422" "16243790"
>>> .. .. ..@ samples : list()
>>> .. .. ..@ hybridizations : list()
>>> .. .. ..@ normControls : list()
>>> .. .. ..@ preprocessing : list()
>>> .. .. ..@ other : list()
>>> .. .. ..@ .__classVersion__:Formal class 'Versions' [package
>>> "Biobase"] with 1 slots
>>> .. .. .. .. ..@ .Data:List of 1
>>> .. .. .. .. .. ..$ : int [1:3] 1 0 0
>>> ..@ assayData :<environment: 0x1078636e8>
>>> ..@ phenoData :Formal class 'AnnotatedDataFrame' [package
>>> "Biobase"] with 4 slots
>>> .. .. ..@ varMetadata :'data.frame': 21 obs. of 1 variable:
>>> .. .. .. ..$ labelDescription: chr [1:21] " Patient ID" " Date of
>>> diagnosis" " Gender of the patient" " Age of the patient at
>>> entry" ...
>>> .. .. ..@ data :'data.frame': 128 obs. of 21 variables:
>>> .. .. .. ..$ cod : chr [1:128] "1005" "1010" "3002"
>>> "4006" ...
>>> .. .. .. ..$ diagnosis : chr [1:128] "5/21/1997" "3/29/2000"
>>> "6/24/1998" "7/17/1997" ...
>>> .. .. .. ..$ sex : Factor w/ 2 levels "F","M": 2 2 1 2 2
>>> 2 1 2 2 2 ...
>>> .. .. .. ..$ age : int [1:128] 53 19 52 38 57 17 18 16
>>> 15 40 ...
>>> .. .. .. ..$ BT : Factor w/ 10 levels "B","B1","B2",..:
>>> 3 3 5 2 3 2 2 2 3 3 ...
>>> .. .. .. ..$ remission : Factor w/ 2 levels "CR","REF": 1 1 1
>>> 1 1 1 1 1 1 1 ...
>>> .. .. .. ..$ CR : chr [1:128] "CR" "CR" "CR" "CR" ...
>>> .. .. .. ..$ date.cr : chr [1:128] "8/6/1997" "6/27/2000"
>>> "8/17/1998" "9/8/1997" ...
>>> .. .. .. ..$ t(4;11) : logi [1:128] FALSE FALSE NA TRUE
>>> FALSE FALSE ...
>>> .. .. .. ..$ t(9;22) : logi [1:128] TRUE FALSE NA FALSE
>>> FALSE FALSE ...
>>> .. .. .. ..$ cyto.normal : logi [1:128] FALSE FALSE NA FALSE
>>> FALSE FALSE ...
>>> .. .. .. ..$ citog : chr [1:128] "t(9;22)" "simple alt."
>>> NA "t(4;11)" ...
>>> .. .. .. ..$ mol.biol : Factor w/ 6 levels "ALL1/AF4","BCR/
>>> ABL",..: 2 4 2 1 4 4 4 4 4 2 ...
>> snipped
>>
>> Aki Hoji, Ph.D
>> Dept. Infectious Diseases & Microbiology
>> University of PIttsburgh
>> Rm427 Parran Hall, GSPH-IDM
>> 130 Desoto St., Pittsburgh, PA 15261
>>
>
David Winsemius, MD
Alameda, CA, USA
More information about the R-help
mailing list