[R] How to auto generate “anchor links “ and directory path to text search function

Vikram Reddy v|kr@m@byreddy @end|ng |rom gm@||@com
Mon Aug 10 18:44:05 CEST 2020


I have a tokenized txt document with 'div' tags and 'id' to it :




    library(quanteda)


    library(htmltools)


    library(tidyverse)




    text <- <div id="4">But how do you do?</div>


            <div id="5">I see I have frightened you—sit... ”</div>


            <div id="6">It was in July, 1805, and the speaker..</div>


            <div id="7">With these words she greeted Prince Vasíli
Kurágin...</div>


            <div id="8">Anna Pávlovna had had a cough for some days...</div>


            <div id="9">She was, as she said, suffering from la
grippe....</div>


            <div id="10">Petersburg, used only by the elite.</div>


            <div id="11">All her invitations without exception, written in
French...</div>


            <div id="12">“If you have nothing better to do, Count (or
Prince).. </div>


            <div id="13">“Heavens!</div>


            <div id="14">what a virulent attack!”</div>





             ''''


            <div id="2107">It was plain that this “well?”</div>







I need to auto generate this output to finish it up



    <a href="C:\Users\John\Desktop\final_tokens.html#div number"> text-
sentence </a>



Ex- When I search for the word 'good'





    <a href="C:\Users\John\Desktop\final_tokens.html#49"> Our good and
wonderful sovereign has to </a>


    <a href="C:\Users\John\Desktop\final_tokens.html#73">He is one of the
the good ones.</a>


    <a href="C:\Users\John\Desktop\final_tokens.html#138">She is rich and
of good family..</a>



the div id number should go beside # as show above.



Previously i used



    make_sentences <- function(word) {


                      grep(word,text,value= TRUE)}



above grep  worked fine with plain text before but with lot of regex I need
to modify it ,to get the anchor links directory path and div number to. is
there any solution to this maybe ?

	[[alternative HTML version deleted]]



More information about the R-help mailing list