[R] Counting enumerated items in each element of a character vector

Michael Hannon jmhannon.ucdavis at gmail.com
Wed Apr 26 06:52:41 CEST 2017


Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
hadn't use stringr.

-- Mike


On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn <istazahn at gmail.com> wrote:
> stringr::str_count (and stringi::stri_count that it wraps) interpret
> the pattern argument as a regular expression by default.
>
> Best,
> Ista
>
> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
> <jmhannon.ucdavis at gmail.com> wrote:
>> I like Boris's "Hadley" solution.  For the record, I've appended a
>> version that uses regular expressions, the only benefit of which is
>> that it could be generalized to find more-complicated patterns.
>>
>> -- Mike
>>
>> counts <- sapply(text1, function(next_string) {
>>     loc_example <- length(gregexpr("Example", next_string)[[1]])
>>     loc_example
>> }, USE.NAMES=FALSE)
>>
>>> counts
>> [1] 5 5 5 5
>>>
>>
>> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote:
>>> I should add: there's a str_count() function in the stringr package.
>>>
>>> library(stringr)
>>> str_count(text1, "Example")
>>> # [1] 5 5 5 5
>>>
>>> I guess that would be the neater solution.
>>>
>>> B.
>>>
>>>
>>>
>>>> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote:
>>>>
>>>> How about:
>>>>
>>>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))
>>>>
>>>>
>>>> Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of
>>>> matches. You can use any regex instead of "example" if you need to tweak what you are looking for.
>>>>
>>>>
>>>> B.
>>>>
>>>>
>>>>
>>>>
>>>>> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am looking for a streamlined way of counting the number of enumerated
>>>>> items are each element of a character vector. For example:
>>>>>
>>>>>
>>>>> text1<-c("This is an example.
>>>>> List 1
>>>>> 1) Example 1
>>>>> 2) Example 2
>>>>> 10) Example 10
>>>>> List 2
>>>>> 1) Example 1
>>>>> 2) Example 2
>>>>> These have been examples.","This is another example.
>>>>> List 1
>>>>> 1. Example 1
>>>>> 2. Example 2
>>>>> 10. Example 10
>>>>> List 2
>>>>> 1. Example 1
>>>>> 2. Example 2
>>>>> These have been examples.","This is a third example. List 1 1) Example 1.
>>>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
>>>>> been examples."
>>>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
>>>>> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>>>>>
>>>>> text1
>>>>>
>>>>> ===
>>>>>
>>>>> I would like the result to be c(5,5,5,5). Notice that sometimes there are
>>>>> leading hard returns, other times not. Sometimes are there separate lists
>>>>> and the same numbers are used in the enumerated items multiple times within
>>>>> each character string. Sometimes the leading numbers for the enumerated
>>>>> items exceed single digits. Notice that the delimiter may be ) or a period
>>>>> (.). If the delimiter is a period and there are hard returns (example 2),
>>>>> then I expect that will be easy enough to differentiate sentences ending
>>>>> with a number from enumerated items. However, I imagine it would be much
>>>>> more difficult to differentiate the two for example 4.
>>>>>
>>>>> Any suggestions are appreciated.
>>>>>
>>>>> Best,
>>>>>
>>>>> Dan
>>>>>
>>>>>      [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list