[R] grep

Fri Aug 2 08:32:49 CEST 2024

Thanks!

On 8/2/2024 12:28 PM, Rui Barradas wrote:
> Às 02:10 de 02/08/2024, Steven Yen escreveu:
>> Good Morning. Below I like statement like
>>
>> j<-grep(".r\\b",colnames(mydata),value=TRUE); j
>>
>> with the \\b option which I read long time ago which Ive found useful.
>>
>> Are there more or these options, other than ? grep? Thanks.
>>
>> dstat is just my own descriptive routine.
>>
>>  > x
>>   [1] "age"          "sleep"        "primary"      "middle"
>>   [5] "high"         "somewhath"    "veryh"        "somewhatm"
>>   [9] "verym"        "somewhatc"    "veryc"        "somewhatl"
>> [13] "veryl"        "village"      "married"      "social"
>> [17] "agricultural" "communist"    "minority"     "religious"
>>  > colnames(mydata)
>>   [1] "depression"     "sleep"          "female" "village"
>>   [5] "agricultural"   "married"        "communist" "minority"
>>   [9] "religious"      "social"         "no" "primary"
>> [13] "middle"         "high"           "veryh" "somewhath"
>> [17] "notveryh"       "verym"          "somewhatm" "notverym"
>> [21] "veryc"          "somewhatc"      "notveryc" "veryl"
>> [25] "somewhatl"      "notveryl"       "age" "village.r"
>> [29] "married.r"      "social.r"       "agricultural.r" "communist.r"
>> [33] "minority.r"     "religious.r"    "male.r" "education.r"
>>  > j<-grep(".r\\b",colnames(mydata),value=TRUE); j
>> [1] "village.r"      "married.r"      "social.r" "agricultural.r"
>> [5] "communist.r"    "minority.r"     "religious.r" "male.r"
>> [9] "education.r"
>>  > j<-c(x,j); j
>>   [1] "age"            "sleep"          "primary" "middle"
>>   [5] "high"           "somewhath"      "veryh" "somewhatm"
>>   [9] "verym"          "somewhatc"      "veryc" "somewhatl"
>> [13] "veryl"          "village"        "married" "social"
>> [17] "agricultural"   "communist"      "minority" "religious"
>> [21] "village.r"      "married.r"      "social.r" "agricultural.r"
>> [25] "communist.r"    "minority.r"     "religious.r" "male.r"
>> [29] "education.r"
>>  > data<-mydata[j]
>>  > cbind(
>> +   dstat(subset(data,male.r==1))[,1:2],
>> +   dstat(subset(data,male.r==0))[,1:2]
>> + )
>> Sample statistics (Weighted =  FALSE )
>>
>> Sample statistics (Weighted =  FALSE )
>>
>>                  Mean Std.dev  Mean Std.dev
>> age            6.279   0.841 6.055   0.813
>> sleep          6.483   1.804 6.087   2.045
>> primary        0.452   0.498 0.408   0.491
>> middle         0.287   0.453 0.176   0.381
>> high           0.171   0.377 0.082   0.275
>> somewhath      0.522   0.500 0.447   0.497
>> veryh          0.254   0.435 0.250   0.433
>> somewhatm      0.419   0.493 0.460   0.498
>> verym          0.544   0.498 0.411   0.492
>> somewhatc      0.376   0.484 0.346   0.476
>> veryc          0.593   0.491 0.615   0.487
>> somewhatl      0.544   0.498 0.504   0.500
>> veryl          0.390   0.488 0.389   0.487
>> village        0.757   0.429 0.752   0.432
>> married        0.936   0.245 0.906   0.291
>> social         0.538   0.499 0.528   0.499
>> agricultural   0.780   0.414 0.826   0.379
>> communist      0.178   0.383 0.038   0.190
>> minority       0.071   0.256 0.081   0.273
>> religious      0.088   0.284 0.102   0.302
>> village.r      0.243   0.429 0.248   0.432
>> married.r      0.064   0.245 0.094   0.291
>> social.r       0.462   0.499 0.472   0.499
>> agricultural.r 0.220   0.414 0.174   0.379
>> communist.r    0.822   0.383 0.962   0.190
>> minority.r     0.929   0.256 0.919   0.273
>> religious.r    0.912   0.284 0.898   0.302
>> male.r         1.000   0.000 0.000   0.000
>> education.r    0.090   0.286 0.334   0.472
>>  >
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> The metacharacters reference is the documentation ?regex.
> If you want to know whether there are more metacharacters similar to \b,
> there are \< and \>. low are examples of using them instead of \b.
>
> Also, the pattern '.r' does not match a period followed by an 'r', the 
> period matches any character ('.'). To match a literal period you must 
> escape it. The correct regex is '\\.r'.
>
>
>
> x <- c("age", "sleep", "primary", "middle", "high", "somewhath", "veryh",
>        "somewhatm", "verym", "somewhatc", "veryc", "somewhatl", "veryl",
>        "village", "married", "social", "agricultural", "communist",
>        "minority", "religious")
> colnms <- c("depression", "sleep", "female", "village", "agricultural",
>             "married", "communist", "minority", "religious", "social", 
> "no",
>             "primary", "middle", "high", "veryh", "somewhath", 
> "notveryh",
>             "verym", "somewhatm", "notverym", "veryc", "somewhatc", 
> "notveryc",
>             "veryl", "somewhatl", "notveryl", "age", "village.r", 
> "married.r",
>             "social.r", "agricultural.r", "communist.r", "minority.r", 
> "religious.r",
>             "male.r", "education.r")
>
> grep("\\.r\\b", colnms, value = TRUE)
> #> [1] "village.r"      "married.r"      "social.r" "agricultural.r"
> #> [5] "communist.r"    "minority.r"     "religious.r" "male.r"
> #> [9] "education.r"
> # the same as above
> # \\> matches the empty string at the end of a word,
> # \\b matches the empty string at both ends of a word
> grep("\\.r\\>", colnms, value = TRUE)
> #> [1] "village.r"      "married.r"      "social.r" "agricultural.r"
> #> [5] "communist.r"    "minority.r"     "religious.r" "male.r"
> #> [9] "education.r"
>
> # 4 col names have a 'm' and end in '.r' therefore 4 matches
> grep("m.*\\.r\\>", colnms, value = TRUE)
> #> [1] "married.r"   "communist.r" "minority.r"  "male.r"
> # only the strings starting with 'm'
> grep("\\bm.*\\.r\\b", colnms, value = TRUE)
> #> [1] "married.r"  "minority.r" "male.r"
> grep("\\<m.*\\.r\\>", colnms, value = TRUE)
> #> [1] "married.r"  "minority.r" "male.r"
>
>
> Hope this helps,
>
> Rui Barradas
>
>