[R] subset with non logical rules
arun
smartpink111 at yahoo.com
Fri Jun 7 15:21:42 CEST 2013
HI,
Try:
?split()
source("http://www.openintro.org/stat/data/cdc.R")
str(cdc)
#'data.frame': 20000 obs. of 9 variables:
# $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 3 3 3 2 2 2 2 3 3 ...
# $ exerany : num 0 0 1 1 0 1 1 0 0 1 ...
# $ hlthplan: num 1 1 1 1 1 1 1 1 1 1 ...
# $ smoke100: num 0 1 1 0 0 0 0 0 1 0 ...
# $ height : num 70 64 60 66 61 64 71 67 65 70 ...
# $ weight : int 175 125 105 132 150 114 194 170 150 180 ...
# $ wtdesire: int 175 115 105 124 130 114 185 160 130 170 ...
# $ age : int 77 33 49 42 55 55 31 45 27 44 ...
# $ gender : Factor w/ 2 levels "m","f": 1 2 2 2 2 2 1 1 2 1 ...
cdc$genhlth<- as.character(cdc$genhlth)
cdclst1<- split(cdc,cdc$genhlth)
lapply(cdclst1,head,2)
#$excellent
# genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#11 excellent 1 1 1 69 186 175 46 m
#13 excellent 1 0 1 66 185 220 21 m
#
#$fair
# genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#12 fair 1 1 1 69 168 148 62 m
#15 fair 1 0 0 69 170 170 23 m
#
#$good
# genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#1 good 0 1 0 70 175 175 77 m
#2 good 0 1 1 64 125 115 33 f
#
#$poor
# genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#53 poor 1 1 1 62 140 130 64 f
#79 poor 1 1 0 63 142 120 52 f
#$`very good`
# genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#5 very good 0 1 0 61 150 130 55 f
#6 very good 1 1 0 64 114 114 55 f
sapply(cdclst1,nrow)
#excellent fair good poor very good
# 4657 2019 5675 677 6972
cdcGood<-cdclst1[["good"]]
str(cdcGood)
#'data.frame': 5675 obs. of 9 variables:
# $ genhlth : chr "good" "good" "good" "good" ...
# $ exerany : num 0 0 1 1 0 1 1 0 1 1 ...
# $ hlthplan: num 1 1 1 1 1 1 1 0 1 1 ...
# $ smoke100: num 0 1 1 0 1 0 1 1 1 1 ...
# $ height : num 70 64 60 66 65 70 73 67 75 65 ...
# $ weight : int 175 125 105 132 150 180 185 156 200 160 ...
# $ wtdesire: int 175 115 105 124 130 170 175 150 190 140 ...
# $ age : int 77 33 49 42 27 44 79 47 43 54 ...
# $ gender : Factor w/ 2 levels "m","f": 1 2 2 2 2 1 1 1 1 2 ...
A.K.
>Hi I am trying to figure out how to subset a bunch of data. As an example I am using the cdc data from openintro.org.
>
>In the first column with the name "genhlth" there are various
options that the persons could respond. For exmaple "good" "very good"
and "poor". Now >what i would like to do is to seperate the data so that
everyone who answered good are stored in one variable and everyone who
answered poor are in >another variable.
>
>Now I know i could just do subset(cdc, cdc$genhlth == "poor") to
get the poor, but would really like for a code that would seperate data
into each >group, regardless of what the text or the number of groups
are.
>
>Can anyone give me a hint?
More information about the R-help
mailing list