[R] grep lines before or after pattern matched?

Joshua Wiley jwiley.psych at gmail.com
Mon Jul 11 19:31:01 CEST 2011


If you know you can find the start of the document (say that line
always starts with Document...), then:

grep("Document+.", yourfile, value = FALSE) + 4

should give you 4 lines after each line where Document occurred.  No
loop needed :)

On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss <sjkiss at gmail.com> wrote:
> Hi Josh,
> Sorry for the insufficient introduction. This might work, but I'm not sure.
> The file that I have includes up to 100 documents (Document 1, Document 2, Document 3....Document 100) with the newspaper name following 4 lines below each Document number.
> I'm using readlines to get the text file into R and then trying to use grep to get the newspaper name for each record. But your idea of indexing the text object read into R with the line number where the newspaper name is found is a good one.  I'll just have to come up with a loop to tell R to get the 4th, 8th, 12, 16th, line, etc.
> I'll see if I can get that to work.
> Simon
> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
>
>> Dear Simon,
>>
>> Maybe I don't understand properly....if you are doing this in R, can't
>> you just pick the line you want?
>>
>> Josh
>>
>> ## print your data to clipboard
>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
>> "clipboard")
>> ## read data in, and only select the 4th line to pass to grep()
>> grep("pattern", x = readLines("clipboard")[4])
>>
>>
>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss <sjkiss at gmail.com> wrote:
>>> Dear colleagues,
>>> I have a series of newspaper articles in a text file, downloaded from a text file.  They look as follows:
>>>
>>> Document 1 of 100
>>> \n
>>> \n
>>> \n
>>> Newspaper Name
>>> \n
>>> \n
>>> Day Date
>>>
>>> I have a series of grep scripts that can extract the date and convert it to a date object, but I can't figure out how to grep the newspaper name.  There is no field ID attached to those lines. The best I can come up with would be to have the program grep the four lines following matching the pattern "Document [0-9]".  There is an an argument to grep in unix that can do this ...grep -A4 'pattern' infile>outfile, but I don't know if there is an equivalent argument in R.
>>>
>>> Any thoughts.
>>> Yours, Simon Kiss
>>> *********************************
>>> Simon J. Kiss, PhD
>>> Assistant Professor, Wilfrid Laurier University
>>> 73 George Street
>>> Brantford, Ontario, Canada
>>> N3T 2C9
>>> Cell: +1 905 746 7606
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> https://joshuawiley.com/
>
> *********************************
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> Cell: +1 905 746 7606
>
>
>
>
>
>
>
>
>
>
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list