[R] Read header csv file
Don MacQueen
macq at llnl.gov
Wed Sep 30 17:31:09 CEST 2009
Lucas,
Although I can't answer all your questions, I can give some suggestions.
I will assume you know how many years of data you
have. Suppose it is 2004 through 2009.
In each input file, the columns must have names,
so I will assume there is a column named "Month".
I am not sure if you want all of your output in
one file, or if you want one output file for each
year.
I am going to assume you want a separate output file for each year.
I am going to assume that when you do the
calculations for each month within each year, the
results
include two variables named "r1" and "pvalue" (as in your example).
Here is some example R code:
## make sure that R is working in the directory that contains the files
## this can be done from a menu in the R window,
if you are using R in Windows or Macintosh,
## or by using the setwd() command
setwd( "directory where my files are" )
years <- 2004:2009
for (yr in years) {
infile <- paste(yr,'.csv',sep='')
outfile <- paste(yr,'.txt',sep='')
clima <- read.csv2(infile,nrows=7)
## open the output file for the current year
sink(outfile)
for (mn in unique(clima$Month)) {
## do calculations for the current month
cat('Year: ', yr, ' Month:', mn, ' R1:', r1,' Pvalue:', pvalue,'\n')
## end of loop through months (in the current year)
}
## close the output file.
sink()
## end of loop through years
}
If you want all the output, including all the
years, in a single file, then put the sink()
commands outside the loop. Like this:
years <- 2004:2009
## open the output file for writing
sink("MyOutput.txt")
for (yr in years) {
infile <- paste(yr,'.csv',sep='')
clima <- read.csv(infile,nrows=7)
for (mn in unique(clima$Month)) {
## do calculations for the current month
## some results are in variables named r1 and pvalue
cat('Year: ', yr, ' Month:', mn, ' R1:', r1,' Pvalue:', pvalue,'\n')
## end of loop through months (in the current year)
}
## end of loop through years
}
## close the output file.
sink()
A few comments:
The sink() command must be given a file name, not a directory name. You had
sink("directore where my text file is")
You can try it first without the sink() commands,
and see the results on your computer screen. This
is a good way to make sure they are correct.
Since you already know your file names, 2004.csv,
2005.csv, and so on, it is not necessary to use
the dir() command to get them. And actually,
better to use list.files(), not dir().
If you do use the list.files() or dir() command,
try not to use it repeatedly inside a loop. It
should be sufficient to it once outside the loop,
for example:
myfiles <- list.files("directory where my files are")
nfiles <- length(myfiles)
for (year in 1:nfiles) {
filename <- myfiles[[year]] ## and I'm
pretty sure it should be myfiles[year], not
[[year]]
}
Also, the way you did it assumes that the dir()
command lists the files in the correct order, and
it also assumes that there are no other files in
the directory.
If you want to run R in a different directory
than the data files are in, you can change a few
things.
datadir <- "directory my files are in"
infiles <- list.files( datadir, full.names=TRUE)
nfiles <- length(infiles)
for ( i in 1:nfiles ) {
clima <- read.csv2(infiles[1], nrow=7)
sink( paste('outfile',i,'.txt',sep='')
## loop over months
## do calculations
## do the cat() commands
sink()
}
At 1:10 PM +0000 9/30/09, Lucas Sevilla García wrote:
>Hi R community,
>
>First of all, I want to thank everybody to share
>their time solving R questions, You are great.
>Ok, for my questions, I've been looking for a
>solutions by myself, in forums but I'm just a
>little bit desesperate so I hope somebody can
>help me. I have built a code to read files from
>a directory. These files are named by a year
>(2004.csv, 2005.csv,...). When the code reads
>first file (2004.csv), inside this file, there
>is information about precipitation of every
>months and I calculate different variables like
>R square adjusted, p value or formula fit to the
>data from linear regression. The code do more
>things but for my question, to explain what I
>need, that part of the code is enough. I want to
>export to the text file, year and month apart of
>some other variables, something like this:
>
>Year: 2004 Month: January R1: 0.98 Pvalue: 0.03 ...
>Year: 2004 Month: February R1:0.78 Pvalue:0.12 ...
>
>I've seen that I can use order sink() and cat(),
>so I would put those orders in my code, like
>this:
>
>nfiles<- length(dir("directory where my files are")) #Count file number
>
> for(year in 1:nfiles) #Read first file
> {
> filename<-dir()[[year]]
>#take first file and read filename, so if year
>is 1, then filename will be 2004, is year is 2,
>filename will be 2005,...
>
> clima<-read.csv2(filename, nrows=7) #open 2004.csv
>
>
>So, if want to export year to my text file I would do
>
>
> for(year in 1:nfiles) #Read first file
>
> {
> sink("directore where my text file is")
>
> filename<-dir()[[year]] #take first file and
>read filename, so if year is 1, then filename will be 2004, is year is
>2, filename will be 2005,...
>
> cat(" Year: ",filename)
> sink()
>
> clima<-read.csv2(filename, nrows=7) #open 2004.csv
>
>And in my text file would read
>
>Year: 2004
>
>Now, I want to the same to months. (I have built
>a for loop to read months inside for loop to
>read years). When I import a csv file I get
>something like this
>
> Januray February ....
>1 3.0 4.1
>2 1.4 3.7
>3 0.2 1.5
>4 6.7 4.1
>.
>.
>.
>
>I can use commands like clima$Januray or
>clima[[1]] but I just get precipitation values.
>However, I am not able to get the header of the
>column. If I would able to do that I could do
>the same as for years and export those headers
>to my text file. Does anyone know how I could do
>that? or does anyone know another way to do what
>I need? Would anyone use sink() and cat()
>commands to create a summary text like the one I
>need to do?. Probably my for loop is not the
>best, I am still a beginner with R, and probably
>there are some better forms to express in R what
>I need but I am working alone so there is nobody
>in person to help me so I apologize for my
>simple questions. Thanks in advance.
>
>Lucas
>
>
>
>
>_________________________________________________________________
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
More information about the R-help
mailing list