[R] selection of missing data
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Sun Nov 13 19:56:09 CET 2005
I do not quite follow your post but here are some suggestions.
1) You can the na.strings argument to simplify things
df <- read.delim(file="lala.txt", na.strings="-" )
2) If you can count the number of metastasis per row first, then find
the rows with zero sum.
met.cols <- c(11,12,14,21,23,24) # metastasis columns
number.of.met <- rowSums( mela[ , met.cols ] == "-" )
have.no.met <- which( number.of.met == 0 )
mela.no.met <- mela[ have.no.met , ]
If you had coded your "-" as NA during read in then, the second line
needs to be changed to
number.of.met <- rowSums( is.na( mela[ , met.cols ] ) )
or simply use complete.cases
met.cols <- c(11,12,14,21,23,24) # metastasis columns
mela.no.met <- mela[ which( complete.cases(mela[ , met.cols]) ) , ]
3) If you name your columns in a systematic fashion, then you can easily
extract and specify those columns. For example if your columns were
named
cn <- c( "age", "colon.met", "PSA.level", "prostate.met", "gender",
"hospitalisation.days", "status", "liver.met", "ethnicity")
Then you can extract those names ending with ".met" as
met.cols <- grep( "\\.met$", cn )
met.cols
[1] 2 4 8
Regards, Adai
On Sun, 2005-11-13 at 18:40 +0100, billemont at cegetel.net wrote:
> Hi i'm a french medical student,
> i have some data that i import from excel. My colomn of the datafram
> are the localisations of metastasis. If there is a metatsasis there is
> the symbol "_". i want to exclude the row without metastasis wich
> represent the NA data.
>
> so, i wrote this
>
> mela is the data fram
>
> mela1=ifelse(mela[,c(11:12,14:21,23,24)]=="_",1,0) # selection of the
> colomn of metastasis localisation
>
> mela4=subset(mela3,Skin ==0 & s.c == 0 & Mucosa ==0 & Soft.ti ==0 &
> Ln.peri==0 & Ln.med==0 & Ln.abdo==0 & Lung==0 & Liver==0 &
> Other.Visc==0 & Bone==0 & Marrow==0 & Brain==0 & Other==0) ## selection
> of the row with no metastasis localisation
> nrow(mela4)
>
> but i dont now if it is possible to make the same thin as
> ifelse(mela3,Skin & s.c== 0, 0,NA) with more than colomn and after to
> exclude of my data the Na with na.omit.
>
> The last question is how can i omit only the row which are NA value for
> the colomn metastasis c(11:12,14:21,23,24))
>
> Thank you for your help
>
>
>
> Bertrand billemont
> [[alternative text/enriched version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list