[R] Looping Through List of .csv Files to Work with Subsets of the Data
William Dunlap
wdunlap at tibco.com
Tue Jun 9 04:28:03 CEST 2015
participant.files <- list.files("/Users/cdanyluck/Documents/Studies/MIG -
Dissertation/Data & Syntax/MIG_RAW DATA & TXT Files/Plain Text Files")
Try adding the argument full.names=TRUE to that call to list.files().
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Mon, Jun 8, 2015 at 7:15 PM, Chad Danyluck <c.danyluck at gmail.com> wrote:
> Thank you Don.
>
> I've incorporated your suggestions which have helped me to understand how
> loops work better than previously. However, the loop gets stuck trying to
> read the current file:
>
> mig.processed.data <- read.csv("/Users/cdanyluck/Documents/Studies/MIG -
> Dissertation/Data & Syntax/mig.log.data.addition.csv")
>
> ## ASSUMPTION: Starting with augmented processedbook and correct
> free.meditation.end
> #### Read in all data files and Loop through to create new data files
> segmented by the rows identified before ####
>
> # get required data
> participant.ids <- mig.processed.data$participant.id
> participant.baseline.start <- mig.processed.data$baseline.row.start
> participant.baseline.end <- mig.processed.data$baseline.row.end
> participant.audio.start <- mig.processed.data$audio.meditation.row.start
> participant.audio.end <- mig.processed.data$audio.meditation.row.end
> participant.free.start <- mig.processed.data$free.meditation.row.start
> participant.free.end <- mig.processed.data$free.meditation.row.end
>
> participant.files <- list.files("/Users/cdanyluck/Documents/Studies/MIG -
> Dissertation/Data & Syntax/MIG_RAW DATA & TXT Files/Plain Text Files")
>
> for (i in 1:length(participant.files)) {
>
> id <- participant.files[i]
>
> ## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
> ## to ensure that the files sort properly when viewed by the operating
> #system
> idc <- formatC(id, width=3, flag='0')
>
> #current file
> crnt.file[i] <- read.csv( participant.files[i] )
>
> ## base
> tmp.base <-
> crnt.file[participant.baseline.start:participant.baseline.end, ]
> write.csv(tmp.base, file=paste0('baseline',idc,'.csv'))
>
>
> ## audio
> tmp.audio <- crnt.file[participant.audio.start:participant.audio.end, ]
> write.csv(tmp.audio, file=paste0('audio',idc,'.csv'))
>
>
>
> ## free
> tmp.free <- crnt.file[participant.free.start:participant.free.end, ]
> write.csv(tmp.free, file=paste0('free',idc,'.csv'))
>
> }
>
> The error message reads:
>
> Error in file(file, "rt") : cannot open the connection
> In addition: Warning message:
> In file(file, "rt") : cannot open file '103.csv': No such file or directory
>
> So it seems to be calling the first file in the list but getting stuck. Any
> suggestions?
>
> Best,
>
> Chad
>
> On Mon, Jun 8, 2015 at 8:07 PM, MacQueen, Don <macqueen1 at llnl.gov> wrote:
>
> > So you have 80 files, one for each participant?
> >
> > It appears that from each of the 80 files you want to extract three
> > subsets of rows,
> > one set for baseline
> > one set for audio
> > one set for "free"
> >
> > What I think I would do, if the above is correct, is create one "master"
> > file. This file will have eight columns:
> > (I'll show an example column name, followed by a description)
> > id participant id
> > fn file name for that participant
> > srb start row for baseline
> > erb end row for baseline
> > sra start row for audio
> > era end row for audio
> > srf start row for free
> > erf end row for free
> >
> > This may be fairly close to what you already have, but I'm not sure.
> >
> > I would then load the master file into R
> > mstf <- read.csv( {the master file} )
> >
> > Then loop through its rows, and since each row has all the information
> > necessary to read the participant's individual file and identify which
> > rows to subset, a loop like this should work.
> >
> > for (irow in seq(nrow(mstf$id))) {
> >
> > id <- mstf$id[irow]
> > ## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
> > ## to ensure that the files sort properly when viewed by the operating
> > system
> > idc <- formatC(id, width=2, flag='0')
> >
> > crnt.file <- read.csv( mstf$fn[irow] )
> >
> > ## base
> > tmp.base <- crnt.file[ mstf$srb[irow]:mstf$erb[irow] , ]
> > write.csv(tmp.base, file=paste0('baseline',idc,'.csv')
> >
> >
> > ## audio
> > tmp.audio <- crnt.file[ mstf$sra[irow]:mstf$era[irow] , ]
> > write.csv(tmp.audio, file=paste0('audio',idc,'.csv')
> >
> >
> >
> > ## free
> > tmp.free <- crnt.file[ mstf$srf[irow]:mstf$erf[irow] , ]
> > write.csv(tmp.free, file=paste0('free',idc,'.csv')
> >
> > }
> >
> >
> > Obviously, I can't test this. And there may be (likely are!) some typos
> in
> > it.
> >
> > Note that it's not necessary to create variables that identify which row
> > the subset should start and end on; these are just looked up from the
> > master file when needed. Similarly, the three respective subsets are
> > stored in temporary data frames, because they are not (I presume) needed
> > when the whole thing is done. (if they were needed, then a different
> > strategy would be more appropriate)
> >
> > There are different ways to index the loop. I just picked one.
> >
> > --
> > Don MacQueen
> >
> > Lawrence Livermore National Laboratory
> > 7000 East Ave., L-627
> > Livermore, CA 94550
> > 925-423-1062
> >
> >
> >
> >
> >
> > On 6/8/15, 2:48 PM, "Chad Danyluck" <c.danyluck at gmail.com> wrote:
> >
> > >Hello,
> > >
> > >I want to subset specific rows of data from 80 .csv files and write
> those
> > >subsets into new .csv files. The data I want to subset starts on a
> > >different row for each original .csv file. I've created variables that
> > >identify which row the subset should start and end on, but I want to
> loop
> > >through this process and I am not sure what to do. I've attempted to
> write
> > >the loop below, albeit, much of it is pseudo code. If anyone can provide
> > >me
> > >with some tips I'd appreciate it.
> > >
> > >#### This data file is used to create the variables where the subsetting
> > >starts and ends for each participant ####
> > >mig.data <- read.csv("/Users/cdanyluck/Documents/Studies/MIG -
> > >Dissertation/Data & Syntax/mig.data.csv")
> > >
> > ># These are the variable names for the start and end of each subset of
> > >relevant data (baseline, audio, and free)
> > >participant.ids <- mig.processed.data$participant.id
> > >participant.baseline.start <- mig.processed.data$baseline.row.start
> > >participant.baseline.end <- mig.processed.data$baseline.row.end
> > >participant.audio.start <- mig.processed.data$audio.meditation.row.start
> > >participant.audio.end <- mig.processed.data$audio.meditation.row.end
> > >participant.free.start <- mig.processed.data$free.meditation.row.start
> > >participant.free.end <- mig.processed.data$free.meditation.row.end
> > >
> > ># read into a list the individual files from which to subset the data
> > >participant.files <- list.files("/Users/cdanyluck/Documents/Studies/MIG
> -
> > >Dissertation/Data & Syntax/MIG_RAW DATA & TXT Files/Plain Text Files")
> > >
> > ># loop through each participant
> > >for (i in 1:length(participant.files)) {
> > >
> > > # get baseline rows
> > > results.baseline <-
> >
> >participant.files[participant.baseline.start[i]:participant.baseline.end[i
> > >],]
> > >
> > > # get audio rows
> > > results.audio
> > ><-
> participant.files[participant.audio.start[i]:participant.audio.end[i],]
> > >
> > > # get free rows
> > > results.free <-
> > >participant.files[participant.free.start[i]:participant.free.end[i],]
> > >
> > > # write out participant relevant data
> > > write.csv(results.baseline, file="baseline[i].csv")
> > > write.csv(results.audio, file = "audio[i].csv")
> > > write.csv(results.free, file = "free[i].csv")
> > >
> > >}
> > >
> > >--
> > >Chad M. Danyluck, MA
> > >PhD Candidate, Psychology
> > >University of Toronto
> > >
> > >
> > >
> > >³There is nothing either good or bad but thinking makes it so.² -
> William
> > >Shakespeare
> > >
> > > [[alternative HTML version deleted]]
> > >
> > >______________________________________________
> > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
> --
> Chad M. Danyluck, MA
> PhD Candidate, Psychology
> University of Toronto
>
>
>
> “There is nothing either good or bad but thinking makes it so.” - William
> Shakespeare
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list