[R] subset with non logical rules

Fri Jun 7 15:21:42 CEST 2013

HI,
Try:
?split()

source("http://www.openintro.org/stat/data/cdc.R")
 str(cdc)
#'data.frame':    20000 obs. of  9 variables:
# $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 3 3 3 2 2 2 2 3 3 ...
# $ exerany : num  0 0 1 1 0 1 1 0 0 1 ...
# $ hlthplan: num  1 1 1 1 1 1 1 1 1 1 ...
# $ smoke100: num  0 1 1 0 0 0 0 0 1 0 ...
# $ height  : num  70 64 60 66 61 64 71 67 65 70 ...
# $ weight  : int  175 125 105 132 150 114 194 170 150 180 ...
# $ wtdesire: int  175 115 105 124 130 114 185 160 130 170 ...
# $ age     : int  77 33 49 42 55 55 31 45 27 44 ...
# $ gender  : Factor w/ 2 levels "m","f": 1 2 2 2 2 2 1 1 2 1 ...
cdc$genhlth<- as.character(cdc$genhlth)
cdclst1<- split(cdc,cdc$genhlth)
lapply(cdclst1,head,2)
#$excellent
#     genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#11 excellent       1        1        1     69    186      175  46      m
#13 excellent       1        0        1     66    185      220  21      m
#
#$fair
#   genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#12    fair       1        1        1     69    168      148  62      m
#15    fair       1        0        0     69    170      170  23      m
#
#$good
#  genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#1    good       0        1        0     70    175      175  77      m
#2    good       0        1        1     64    125      115  33      f
#
#$poor
#   genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#53    poor       1        1        1     62    140      130  64      f
#79    poor       1        1        0     63    142      120  52      f

#$`very good`
#    genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#5 very good       0        1        0     61    150      130  55      f
#6 very good       1        1        0     64    114      114  55      f


sapply(cdclst1,nrow)
#excellent      fair      good      poor very good 
#     4657      2019      5675       677      6972 

cdcGood<-cdclst1[["good"]]
  str(cdcGood)
#'data.frame':    5675 obs. of  9 variables:
# $ genhlth : chr  "good" "good" "good" "good" ...
# $ exerany : num  0 0 1 1 0 1 1 0 1 1 ...
# $ hlthplan: num  1 1 1 1 1 1 1 0 1 1 ...
# $ smoke100: num  0 1 1 0 1 0 1 1 1 1 ...
# $ height  : num  70 64 60 66 65 70 73 67 75 65 ...
# $ weight  : int  175 125 105 132 150 180 185 156 200 160 ...
# $ wtdesire: int  175 115 105 124 130 170 175 150 190 140 ...
# $ age     : int  77 33 49 42 27 44 79 47 43 54 ...
# $ gender  : Factor w/ 2 levels "m","f": 1 2 2 2 2 1 1 1 1 2 ...
 

A.K.


>Hi I am trying to figure out how to subset a bunch of data. As an example I am using the cdc data from openintro.org. 
>
>In the first column with the name "genhlth" there are various 
options that the persons could respond. For exmaple "good" "very good" 
and "poor". Now >what i would like to do is to seperate the data so that 
everyone who answered good are stored in one variable and everyone who 
answered poor are in >another variable. 
>
>Now I know i could just do subset(cdc, cdc$genhlth == "poor") to
 get the poor, but would really like for a code that would seperate data
 into each >group, regardless of what the text or the number of groups 
are. 
>
>Can anyone give me a hint?