[R] Subsampling data

Eik Vettorazzi E.Vettorazzi at uke.uni-hamburg.de
Thu Aug 11 19:06:37 CEST 2011


Hi Stefán
you might not to see the wood for the trees, but ?subset is a R function
as well.
MalesData <- subset(Datatemp,Datatemp$sex==1)

btw. your selection
> MalesData <- Datatemp[Datatemp $sex==1]

went wrong for two reasons:
(a) the extra space befor $
(b) incorrect indexing. Datatemp is a data.frame and has 2 dimensions
(and to my surprise indexing on one dimension only returns the
respective columns, which is different from matrix indexing), so
MalesData <- Datatemp[Datatemp$sex==1,]

should work as well.


Am 11.08.2011 16:16, schrieb Stefán Hrafn Jónsson:
> *Dear R community*
> 
> * *
> 
> *I have two questions on data subsample manipulation. I am starting to use R
> again after a long brake  and feel a bit rusty.*
> 
> * *
> 
> *I want to select a subsample of data for males and females separately*
> 
> * *
> 
> 
> 
> library(foreign)
> 
> Datatemp  <- read.spss("H:/Skjol/Data/HL/t1and2b.sav", use.value.labels = F)
> 
> 
> 
> 
> 
> 
>> table(Datatemp$sex)
> 
> 
> 
>    1    2
> 
> 3049 3702
> 
> 
> 
>> attributes(Datatemp)
> 
>> 
> $names
> 
>  [1] "nomiss"        "Bin"           "rad09"         "year"
> "sex"
> "income"        "adults"
> 
>  [8] "children"      "student"       "retired"       "disabled"
> "homemaker"     "unemployed"    "employed"
> 
> [15] "occupation"    "residencysize" "educ"          "agemean"
> "age"
> "marital"
> 
> 
> 
> $codepage
> 
> [1] 1252
> 
> 
> 
>> MalesData <- Datatemp[Datatemp $sex==1]
> 
>> MalesData
> 
> named list()
> 
>> attributes(MalesData)
> 
> $names
> 
> character(0)
> 
> 
> 
> 
> Females.Data <- Datatemp[Datatemp $sex==2]
> 
> 
> 
> 
> 
> 
> 
> *This subset extraction is not working. Is there anyone who can tell me what
> I did wrong?*
> 
> * *
> 
> * *
> 
> *A different but related question is the use of the function paste or if I
> need another function to do the following: *
> 
> * *
> 
> * *
> 
> *Rather than this*:
> 
> 
> 
>> m2 <- gee( Bin ~  agemean +  year,  id = rad09 , data = datause ,
> subset=kyn== 1 ,
> 
>            family =  binomial, corstr ="exchangeable" )
> 
> 
> 
> 
> 
> 
> 
> *I want to do this (modified in  a loop). *
> 
> 
> 
> 
> 
>> subsampl <- "kyn== 1 "
> 
> 
> 
>> m2 <- gee( Bin ~  agemean +  year,  id = rad09 , data = datause ,
> subset=paste(subsampl) ,
> 
>            family =  binomial, corstr ="exchangeable" )
> 
> 
> 
> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
> 
> Error in gee(Bin ~ agemean + year, id = rad09, data = datause, subset =
> paste(subsampl),  :
> 
>   rank-deficient model matrix
> 
> 
> 
> 
> 
> 
> 
> *I hope you can see what I want to do, but I think I may need other function
> than paste()*
> 
> *
> *
> 
> *I appreciate a lot any help. *
> 
> *
> Stefan Hrafn*
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790



More information about the R-help mailing list