[R] Counting enumerated items in each element of a character vector

Boris Steipe boris.steipe at utoronto.ca
Wed Apr 26 14:35:23 CEST 2017


What's the expected output for this sample?

How do _you_ define what should be counted?





> On Apr 26, 2017, at 8:33 AM, Dan Abner <dan.abner99 at gmail.com> wrote:
> 
> Hi all,
> 
> I was not clearly enough in my example code. Please see below where "blah
> blah blah" can be ANY text or numbers: No predictable pattern at all to
> what may or may not be written in place of "blah blah blah".
> 
> text1<-c("blah blah blah.
> blah blah blah
> 1) blah blah blah 1
> 2) blah blah blah
> 10) blah 10 blah blah
> blah blah blah
> 1) blah blah blah
> 2) blah blah blah 2
> blah blah blah.","blah blah blah.
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> 10.blah 10 blah blah
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) blah
> blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah
> blah blah. blah blah blah."
> ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah.
> 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah
> blah. blah blah blah.")
> 
> text1
> 
> Thank you in advance for your suggestions and/or guidance.
> 
> Best,
> 
> Dan
> 
> 
> On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon <jmhannon.ucdavis at gmail.com
>> wrote:
> 
>> Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
>> hadn't use stringr.
>> 
>> -- Mike
>> 
>> 
>> On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn <istazahn at gmail.com> wrote:
>>> stringr::str_count (and stringi::stri_count that it wraps) interpret
>>> the pattern argument as a regular expression by default.
>>> 
>>> Best,
>>> Ista
>>> 
>>> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
>>> <jmhannon.ucdavis at gmail.com> wrote:
>>>> I like Boris's "Hadley" solution.  For the record, I've appended a
>>>> version that uses regular expressions, the only benefit of which is
>>>> that it could be generalized to find more-complicated patterns.
>>>> 
>>>> -- Mike
>>>> 
>>>> counts <- sapply(text1, function(next_string) {
>>>>    loc_example <- length(gregexpr("Example", next_string)[[1]])
>>>>    loc_example
>>>> }, USE.NAMES=FALSE)
>>>> 
>>>>> counts
>>>> [1] 5 5 5 5
>>>>> 
>>>> 
>>>> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe <boris.steipe at utoronto.ca>
>> wrote:
>>>>> I should add: there's a str_count() function in the stringr package.
>>>>> 
>>>>> library(stringr)
>>>>> str_count(text1, "Example")
>>>>> # [1] 5 5 5 5
>>>>> 
>>>>> I guess that would be the neater solution.
>>>>> 
>>>>> B.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at utoronto.ca>
>> wrote:
>>>>>> 
>>>>>> How about:
>>>>>> 
>>>>>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1
>> } ))
>>>>>> 
>>>>>> 
>>>>>> Splitting your string on the five "Examples" in each gives six
>> elements. length(x) - 1 is the number of
>>>>>> matches. You can use any regex instead of "example" if you need to
>> tweak what you are looking for.
>>>>>> 
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com>
>> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I am looking for a streamlined way of counting the number of
>> enumerated
>>>>>>> items are each element of a character vector. For example:
>>>>>>> 
>>>>>>> 
>>>>>>> text1<-c("This is an example.
>>>>>>> List 1
>>>>>>> 1) Example 1
>>>>>>> 2) Example 2
>>>>>>> 10) Example 10
>>>>>>> List 2
>>>>>>> 1) Example 1
>>>>>>> 2) Example 2
>>>>>>> These have been examples.","This is another example.
>>>>>>> List 1
>>>>>>> 1. Example 1
>>>>>>> 2. Example 2
>>>>>>> 10. Example 10
>>>>>>> List 2
>>>>>>> 1. Example 1
>>>>>>> 2. Example 2
>>>>>>> These have been examples.","This is a third example. List 1 1)
>> Example 1.
>>>>>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2.
>> These have
>>>>>>> been examples."
>>>>>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10.
>> Example
>>>>>>> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>>>>>>> 
>>>>>>> text1
>>>>>>> 
>>>>>>> ===
>>>>>>> 
>>>>>>> I would like the result to be c(5,5,5,5). Notice that sometimes
>> there are
>>>>>>> leading hard returns, other times not. Sometimes are there separate
>> lists
>>>>>>> and the same numbers are used in the enumerated items multiple times
>> within
>>>>>>> each character string. Sometimes the leading numbers for the
>> enumerated
>>>>>>> items exceed single digits. Notice that the delimiter may be ) or a
>> period
>>>>>>> (.). If the delimiter is a period and there are hard returns
>> (example 2),
>>>>>>> then I expect that will be easy enough to differentiate sentences
>> ending
>>>>>>> with a number from enumerated items. However, I imagine it would be
>> much
>>>>>>> more difficult to differentiate the two for example 4.
>>>>>>> 
>>>>>>> Any suggestions are appreciated.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Dan
>>>>>>> 
>>>>>>>     [[alternative HTML version deleted]]
>>>>>>> 
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>> 
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list