[R] How to fetch specific part from a number of Text files?
Charles C. Berry
cberry at tajo.ucsd.edu
Mon Dec 15 23:00:29 CET 2008
On Mon, 15 Dec 2008, megh wrote:
>
> Thanks Charles for this reply. I have started according to your suggestion
> and hopefully I can do it. In the mean time what I was thinking, instead of
> calling my text files by their names, is there any mechanism to call them by
> the order they are stored in that directory?
I am not sure what that order would be. If you mean 'how would I order
files by (say) creation date?', see
?file.info
Eventually you need a string that has the file name in it or a connection
object (see ?connection) that accesses the file(s).
Means, suppose, I have total
> 1000 text files in that directory and therefore I create a vector like
> sel.no <- c(1:1000). Next I use the i-th element of the vector "sel.no" to
> access the i-th file?
Hmmm. Something about this question is telling me you are either a novice
programmer or really unfamiliar with R or perhaps you just need that extra
cup of coffee.
In any but the latter case, let me suggest that it helps to reread the
Intro to R (and any other books/manuals you might have), read help pages
for possibly relevant functions, and to run example( file.info ), say, to
get a handle on functions you are tying to learn. Also, rereading the
_posting guide_ is helpful as it is, in part, a guide to figuring out
things in R.
HTH,
Chuck
>
> With regards,
>
>
>
> Charles C. Berry wrote:
>>
>> On Mon, 15 Dec 2008, megh wrote:
>>
>>>
>>> Hi all,
>>>
>>> I my c: drive I have possibly 1,000 notepad files, with .txt extension.
>>> They
>>> are named as the dates on which they were saved i.e. 1st file name is
>>> "Volume_4-18-2008", 2nd one is "Volume_4-21-2008", 3rd one
>>> "Volume_4-22-2008" and so on............
>>>
>>> Also, content of each file are in same format like :
>>>
>>> ******** content of 1st file *************
>>> section : 1
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> section : 2
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> section : 3
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> section : 4
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>> ----- --------- ---------- -----------
>>>
>>> Here all files have 4-sections, just like shown here but contents within
>>> each section (i.e. dashed line here) differs file to file.
>>>
>>> What I have to do is I have to fetch contents of "section : 2" from each
>>> file and then save it to a R-object, matrix of list for further analysis.
>>>
>>> Can you ppl please tell me how to do that?
>>
>> Here is the outline:
>>
>> *) use list.files() or Sys.glob() to get a list of the files
>>
>> *) write a function that takes the file name as its arg, uses
>> readLines() to swallow the text and uses grep() to find the
>> 'section' lines. Then put the 'dashes' in between two section
>> lines into a separate object (say, dash.lines). Then use
>>
>> as.matrix( read.table(con <- textConnection( dash.lines ) )
>> close(con)
>>
>> to get the numeric values or maybe
>>
>> sapply( strsplit(dash.lines, "[ ]+"), as.numeric)
>>
>> *) debug this on one file
>>
>>
>> *) use lapply to step thru the list of file names.
>>
>> See
>>
>> ?list.files
>> ?Sys.glob
>> ?readLines
>> ?grep
>> ?textConnection
>> ?strsplit
>> ?sapply
>>
>> HTH,
>>
>> Chuck
>>
>>
>>>
>>> Thanks and regards,
>>> --
>>> View this message in context:
>>> http://www.nabble.com/How-to-fetch-specific-part-from-a-number-of-Text-files--tp21011017p21011017.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Charles C. Berry (858) 534-2098
>> Dept of Family/Preventive
>> Medicine
>> E mailto:cberry at tajo.ucsd.edu UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> View this message in context: http://www.nabble.com/How-to-fetch-specific-part-from-a-number-of-Text-files--tp21011017p21020032.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list