[R] Read multiple files into dataframe?
jim holtman
jholtman at gmail.com
Tue Sep 1 23:32:49 CEST 2009
I would put the data into a 'long' instead of 'wide' format since you
say you have files of different lengths. I took you data and
replicated it 3 time and changed the file name for the duration:
> fileNames <- Sys.glob('/da_zone*') # files to process
> result <- lapply(fileNames, function(.file){
+ # read in data after skipping 11 lines
+ .input <- read.csv(.file, skip=11)
+ # extract the duration from file name
+ .dur <- sub(".*_([[:digit:]]+)hr_.*", "\\1", .file, perl=TRUE)
+ # add to the data frame
+ .input$dur <- .dur
+ .input
+ })
> # put into a single data.frame
> do.call(rbind, result)
avgppt areasqmi dur
1 7.67 0 15
2 7.60 1 15
3 7.52 5 15
4 7.32 10 15
5 6.91 20 15
6 5.90 50 15
7 5.02 100 15
8 4.09 200 15
9 3.55 300 15
10 2.96 500 15
11 2.27 1000 15
12 1.64 2000 15
13 0.82 5000 15
14 0.77 5360 15
15 7.67 0 1
16 7.60 1 1
17 7.52 5 1
18 7.32 10 1
19 6.91 20 1
20 5.90 50 1
21 5.02 100 1
22 4.09 200 1
23 3.55 300 1
24 2.96 500 1
25 2.27 1000 1
26 1.64 2000 1
27 0.82 5000 1
28 0.77 5360 1
29 7.67 0 3
30 7.60 1 3
31 7.52 5 3
32 7.32 10 3
33 6.91 20 3
34 5.90 50 3
35 5.02 100 3
36 4.09 200 3
37 3.55 300 3
38 2.96 500 3
39 2.27 1000 3
40 1.64 2000 3
41 0.82 5000 3
42 0.77 5360 3
On Tue, Sep 1, 2009 at 4:24 PM, Douglas M.
Hultstrand<dmhultst at metstat.com> wrote:
> Hello,
>
> I am fairly new to R programming and am stuck with the following problem.
>
> I am trying to read in multiple files (see attached file or at end of
> email), the files all have the same general header information and different
> precipitation (avgppt) and area (areasqmi) values. Some times the number of
> records are different in the files.
>
> I want to read in all files (.stdsummary), and create a dataframe that
> contains the area and precipitation for each file (files are different
> duration), and supply a header name that represents the duration (sixth line
> down in header information or extracted from data file
> "da_zone1_15hr_1166.stdsummary").
> For example, this is what the final dataframe would look like for 1hr, 3hr,
> and 15hr datafiles:
> 1hrppt 1hrarea 3hrppt 3hrarea 15hrppt 15hrarea 3.8 0
> 6.86 0 7.67 0
> 3.71 1 6.78 1 7.6 1
> 3.69 5 6.72 5 7.52 5
> 3.56 10 6.55 10 7.32 10
> 3.33 20 6.17 20 6.91 20
> 2.87 50 5.25 50 5.9 50
> 2.45 100 4.35 100 5.02 100
> 1.94 200 3.34 200 4.09 200
> 1.67 300 2.78 300 3.55 300
>
> The end result is to perform QC statistics and then plot each set of data.
> Also, is there away to create a dataframe that has different # of records?
>
> Datafile example of file below:
>
> Storm number: 1166
> Zone number: 1 (ALL zones)
> Number of stations: 172
> Total analyzed area (sq mi): 5360.8
> Average station density (stns per 1000 sq mi): na
> Duration window (hours): 15
> CPP beg hour index: 1
> CPP end hour index: 15
> Ishohyet interval step (inches): 0.2
> Standard area size summary
> Begin run date/time: Tue Aug 25 01:17:43 2009
> avgppt, areasqmi
> 00007.67,0000000.00
> 00007.60,0000001.00
> 00007.52,0000005.00
> 00007.32,0000010.00
> 00006.91,0000020.00
> 00005.90,0000050.00
> 00005.02,0000100.00
> 00004.09,0000200.00
> 00003.55,0000300.00
> 00002.96,0000500.00
> 00002.27,0001000.00
> 00001.64,0002000.00
> 00000.82,0005000.00
> 00000.77,0005360.00
>
> --
> ---------------------------------
> Douglas M. Hultstrand, MS
> Senior Hydrometeorologist
> Metstat, Inc. Windsor, Colorado
> voice: 970.686.1253
> email: dmhultst at metstat.com
> web: http://www.metstat.com
> ---------------------------------
>
>
> Storm number: 1166
> Zone number: 1 (ALL zones)
> Number of stations: 172
> Total analyzed area (sq mi): 5360.8
> Average station density (stns per 1000 sq mi): na
> Duration window (hours): 15
> CPP beg hour index: 1
> CPP end hour index: 15
> Ishohyet interval step (inches): 0.2
> Standard area size summary
> Begin run date/time: Tue Aug 25 01:17:43 2009
> avgppt, areasqmi
> 00007.67,0000000.00
> 00007.60,0000001.00
> 00007.52,0000005.00
> 00007.32,0000010.00
> 00006.91,0000020.00
> 00005.90,0000050.00
> 00005.02,0000100.00
> 00004.09,0000200.00
> 00003.55,0000300.00
> 00002.96,0000500.00
> 00002.27,0001000.00
> 00001.64,0002000.00
> 00000.82,0005000.00
> 00000.77,0005360.00
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list