[R] Removing rows if certain elements are found in character string

David Winsemius dwinsemius at comcast.net
Tue Jul 3 04:58:22 CEST 2012


On Jul 2, 2012, at 6:48 PM, Claudia Penaloza wrote:

> I would like to remove rows from the following data frame (df) if  
> there are
> only two specific elements found in the df$ch character string (I  
> want to
> remove rows with only "0" & "D" or "0" & "d"). Alternatively, I  
> would like
> to remove rows if the first non-zero element is "D" or "d".
>
>
>                                                 ch     count
> 1  0000000000D0000000000000000000000000000000000000 0.007368;
> 2  0000000000d0000000000000000000000000000000000000 0.002456;
> 3  000000000T00000000000000000000000000000000000000 0.007368;
> 4  000000000TD0000000000000000000000000000000000000 0.007368;
> 5  000000000T00000000000000000000000000000000000000 0.002456;
> 6  000000000Td0000000000000000000000000000000000000 0.002456;
> 7  00000000T000000000000000000000000000000000000000 0.007368;
> 8  00000000T0D0000000000000000000000000000000000000 0.007368;
> 9  00000000T000000000000000000000000000000000000000 0.002456;
> 10 00000000T0d0000000000000000000000000000000000000 0.002456;
>
>
> I tried the following but it doesn't work if there is more than one
> character per string:
>
>> df <- df[!df$ch %in% c("0","D"),]
>> df <- df[!df$ch %in% c("0","d"),]

You seem to be missing test cases for the second set of conditions but  
this works for the first set (and might for the second):

 > dat[ grepl("[^0dD]", dat$ch) & ! grepl("^0+d|^0^D", dat$ch) , ]
                                                  ch    count
3  000000000T00000000000000000000000000000000000000 0.007368
4  000000000TD0000000000000000000000000000000000000 0.007368
5  000000000T00000000000000000000000000000000000000 0.002456
6  000000000Td0000000000000000000000000000000000000 0.002456
7  00000000T000000000000000000000000000000000000000 0.007368
8  00000000T0D0000000000000000000000000000000000000 0.007368
9  00000000T000000000000000000000000000000000000000 0.002456
10 00000000T0d0000000000000000000000000000000000000 0.002456
>


-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list