# [R] selection of missing data

Sun Nov 13 19:56:09 CET 2005

```I do not quite follow your post but here are some suggestions.

1) You can the na.strings argument to simplify things

2) If you can count the number of metastasis per row first, then find
the rows with zero sum.

met.cols      <- c(11,12,14,21,23,24) # metastasis columns
number.of.met <- rowSums( mela[ , met.cols ] == "-" )
have.no.met   <- which( number.of.met == 0 )
mela.no.met   <- mela[ have.no.met , ]

needs to be changed to

number.of.met <- rowSums( is.na( mela[ , met.cols ] ) )

or simply use complete.cases

met.cols      <- c(11,12,14,21,23,24) # metastasis columns
mela.no.met   <- mela[ which( complete.cases(mela[ , met.cols]) ) , ]

3) If you name your columns in a systematic fashion, then you can easily
extract and specify those columns. For example if your columns were
named

cn <- c( "age", "colon.met", "PSA.level", "prostate.met", "gender",
"hospitalisation.days", "status", "liver.met", "ethnicity")

Then you can extract those names ending with ".met" as

met.cols <- grep( "\\.met\$", cn )
met.cols
[1] 2 4 8

On Sun, 2005-11-13 at 18:40 +0100, billemont at cegetel.net wrote:
> Hi i'm a french medical student,
> i have some data that i import from excel. My colomn of the datafram
> are the localisations of metastasis. If there is a metatsasis there is
> the symbol "_". i want to exclude the row without metastasis wich
> represent the NA data.
>
> so, i wrote this
>
> mela is the data fram
>
> mela1=ifelse(mela[,c(11:12,14:21,23,24)]=="_",1,0) # selection of the
> colomn of metastasis localisation
>
> mela4=subset(mela3,Skin ==0 & s.c == 0 & Mucosa ==0 & Soft.ti ==0 &
> Ln.peri==0 & Ln.med==0 & Ln.abdo==0 & Lung==0 & Liver==0 &
> Other.Visc==0 & Bone==0 & Marrow==0 & Brain==0 & Other==0) ## selection
> of the row with no metastasis localisation
> nrow(mela4)
>
> but i dont now if it is possible to make the same thin as
> ifelse(mela3,Skin & s.c== 0, 0,NA) with more than colomn and after to
> exclude of my data the Na with na.omit.
>
> The last question is how can i omit only the row which are NA value for
> the colomn metastasis c(11:12,14:21,23,24))
>
> Thank you for your help
>
>
>
> Bertrand billemont
> 	[[alternative text/enriched version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help