[R] Proper use of grep
Erik Iverson
eriki at ccbr.umn.edu
Thu Jul 15 23:36:02 CEST 2010
Doran, Harold wrote:
> I just need to confirm something with pattern matching folks. I have
> a factor with the following levels in a very large data set:
>
>> levels(all$Classical.Statistic)
> [1] "" "AB;ABD"
> "CollapsedSteps" "CR_P" "CR_Prop;CR_P;AB"
> [6] "NMK" "NMK;P" "NMK;P;ABD"
> "P" "ABD" [11] "CR_P;CollapsedSteps"
> "NMK;AB;ABD" "NMK;ABD" "NMK;P;AB"
> "NMK;P;AB;ABD" [16] "AB" "CRT;CollapsedSteps"
> "NMK;AB" "CR_P;CRT;CollapsedSteps" "CR_Prop;CR_P"
>
> I need to subset the rows in which the term "CollapsedSteps" appears.
> So, it may appear as "CollapsedSteps" or may appear as
> "CR_P;CRT;CollapsedSteps" as you can see above. I'm using grep as
> follows:
>
> all[grep('CollapsedSteps', all$Classical.Statistic),]
>
> to find any row in which the term "'CollapsedSteps" appears. Is this
> certain to catch all cases, or is there an intricacy that I may have
> missed.
Well, just try it for yourself on a data.frame that's small enough to
verify 'manually'. For instance, the data.frame that contains each
level exactly once sounds like a good candidate.
test <- subset(all, !duplicated(Classical.Statistic)
and then try your line of code ...
And do you really want "" as a level, or should those by NA?
More information about the R-help
mailing list