[R] grep
Steven Yen
@tyen @end|ng |rom ntu@edu@tw
Fri Aug 2 08:32:49 CEST 2024
Thanks!
On 8/2/2024 12:28 PM, Rui Barradas wrote:
> Às 02:10 de 02/08/2024, Steven Yen escreveu:
>> Good Morning. Below I like statement like
>>
>> j<-grep(".r\\b",colnames(mydata),value=TRUE); j
>>
>> with the \\b option which I read long time ago which Ive found useful.
>>
>> Are there more or these options, other than ? grep? Thanks.
>>
>> dstat is just my own descriptive routine.
>>
>> > x
>> [1] "age" "sleep" "primary" "middle"
>> [5] "high" "somewhath" "veryh" "somewhatm"
>> [9] "verym" "somewhatc" "veryc" "somewhatl"
>> [13] "veryl" "village" "married" "social"
>> [17] "agricultural" "communist" "minority" "religious"
>> > colnames(mydata)
>> [1] "depression" "sleep" "female" "village"
>> [5] "agricultural" "married" "communist" "minority"
>> [9] "religious" "social" "no" "primary"
>> [13] "middle" "high" "veryh" "somewhath"
>> [17] "notveryh" "verym" "somewhatm" "notverym"
>> [21] "veryc" "somewhatc" "notveryc" "veryl"
>> [25] "somewhatl" "notveryl" "age" "village.r"
>> [29] "married.r" "social.r" "agricultural.r" "communist.r"
>> [33] "minority.r" "religious.r" "male.r" "education.r"
>> > j<-grep(".r\\b",colnames(mydata),value=TRUE); j
>> [1] "village.r" "married.r" "social.r" "agricultural.r"
>> [5] "communist.r" "minority.r" "religious.r" "male.r"
>> [9] "education.r"
>> > j<-c(x,j); j
>> [1] "age" "sleep" "primary" "middle"
>> [5] "high" "somewhath" "veryh" "somewhatm"
>> [9] "verym" "somewhatc" "veryc" "somewhatl"
>> [13] "veryl" "village" "married" "social"
>> [17] "agricultural" "communist" "minority" "religious"
>> [21] "village.r" "married.r" "social.r" "agricultural.r"
>> [25] "communist.r" "minority.r" "religious.r" "male.r"
>> [29] "education.r"
>> > data<-mydata[j]
>> > cbind(
>> + dstat(subset(data,male.r==1))[,1:2],
>> + dstat(subset(data,male.r==0))[,1:2]
>> + )
>> Sample statistics (Weighted = FALSE )
>>
>> Sample statistics (Weighted = FALSE )
>>
>> Mean Std.dev Mean Std.dev
>> age 6.279 0.841 6.055 0.813
>> sleep 6.483 1.804 6.087 2.045
>> primary 0.452 0.498 0.408 0.491
>> middle 0.287 0.453 0.176 0.381
>> high 0.171 0.377 0.082 0.275
>> somewhath 0.522 0.500 0.447 0.497
>> veryh 0.254 0.435 0.250 0.433
>> somewhatm 0.419 0.493 0.460 0.498
>> verym 0.544 0.498 0.411 0.492
>> somewhatc 0.376 0.484 0.346 0.476
>> veryc 0.593 0.491 0.615 0.487
>> somewhatl 0.544 0.498 0.504 0.500
>> veryl 0.390 0.488 0.389 0.487
>> village 0.757 0.429 0.752 0.432
>> married 0.936 0.245 0.906 0.291
>> social 0.538 0.499 0.528 0.499
>> agricultural 0.780 0.414 0.826 0.379
>> communist 0.178 0.383 0.038 0.190
>> minority 0.071 0.256 0.081 0.273
>> religious 0.088 0.284 0.102 0.302
>> village.r 0.243 0.429 0.248 0.432
>> married.r 0.064 0.245 0.094 0.291
>> social.r 0.462 0.499 0.472 0.499
>> agricultural.r 0.220 0.414 0.174 0.379
>> communist.r 0.822 0.383 0.962 0.190
>> minority.r 0.929 0.256 0.919 0.273
>> religious.r 0.912 0.284 0.898 0.302
>> male.r 1.000 0.000 0.000 0.000
>> education.r 0.090 0.286 0.334 0.472
>> >
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> The metacharacters reference is the documentation ?regex.
> If you want to know whether there are more metacharacters similar to \b,
> there are \< and \>. low are examples of using them instead of \b.
>
> Also, the pattern '.r' does not match a period followed by an 'r', the
> period matches any character ('.'). To match a literal period you must
> escape it. The correct regex is '\\.r'.
>
>
>
> x <- c("age", "sleep", "primary", "middle", "high", "somewhath", "veryh",
> "somewhatm", "verym", "somewhatc", "veryc", "somewhatl", "veryl",
> "village", "married", "social", "agricultural", "communist",
> "minority", "religious")
> colnms <- c("depression", "sleep", "female", "village", "agricultural",
> "married", "communist", "minority", "religious", "social",
> "no",
> "primary", "middle", "high", "veryh", "somewhath",
> "notveryh",
> "verym", "somewhatm", "notverym", "veryc", "somewhatc",
> "notveryc",
> "veryl", "somewhatl", "notveryl", "age", "village.r",
> "married.r",
> "social.r", "agricultural.r", "communist.r", "minority.r",
> "religious.r",
> "male.r", "education.r")
>
> grep("\\.r\\b", colnms, value = TRUE)
> #> [1] "village.r" "married.r" "social.r" "agricultural.r"
> #> [5] "communist.r" "minority.r" "religious.r" "male.r"
> #> [9] "education.r"
> # the same as above
> # \\> matches the empty string at the end of a word,
> # \\b matches the empty string at both ends of a word
> grep("\\.r\\>", colnms, value = TRUE)
> #> [1] "village.r" "married.r" "social.r" "agricultural.r"
> #> [5] "communist.r" "minority.r" "religious.r" "male.r"
> #> [9] "education.r"
>
> # 4 col names have a 'm' and end in '.r' therefore 4 matches
> grep("m.*\\.r\\>", colnms, value = TRUE)
> #> [1] "married.r" "communist.r" "minority.r" "male.r"
> # only the strings starting with 'm'
> grep("\\bm.*\\.r\\b", colnms, value = TRUE)
> #> [1] "married.r" "minority.r" "male.r"
> grep("\\<m.*\\.r\\>", colnms, value = TRUE)
> #> [1] "married.r" "minority.r" "male.r"
>
>
> Hope this helps,
>
> Rui Barradas
>
>
More information about the R-help
mailing list