[R] Looping Through List of .csv Files to Work with Subsets of the Data
MacQueen, Don
macqueen1 at llnl.gov
Tue Jun 9 02:07:45 CEST 2015
So you have 80 files, one for each participant?
It appears that from each of the 80 files you want to extract three
subsets of rows,
one set for baseline
one set for audio
one set for "free"
What I think I would do, if the above is correct, is create one "master"
file. This file will have eight columns:
(I'll show an example column name, followed by a description)
id participant id
fn file name for that participant
srb start row for baseline
erb end row for baseline
sra start row for audio
era end row for audio
srf start row for free
erf end row for free
This may be fairly close to what you already have, but I'm not sure.
I would then load the master file into R
mstf <- read.csv( {the master file} )
Then loop through its rows, and since each row has all the information
necessary to read the participant's individual file and identify which
rows to subset, a loop like this should work.
for (irow in seq(nrow(mstf$id))) {
id <- mstf$id[irow]
## if id is numeric, e.g., 1, 2, 3 ... 80 then I would do this
## to ensure that the files sort properly when viewed by the operating
system
idc <- formatC(id, width=2, flag='0')
crnt.file <- read.csv( mstf$fn[irow] )
## base
tmp.base <- crnt.file[ mstf$srb[irow]:mstf$erb[irow] , ]
write.csv(tmp.base, file=paste0('baseline',idc,'.csv')
## audio
tmp.audio <- crnt.file[ mstf$sra[irow]:mstf$era[irow] , ]
write.csv(tmp.audio, file=paste0('audio',idc,'.csv')
## free
tmp.free <- crnt.file[ mstf$srf[irow]:mstf$erf[irow] , ]
write.csv(tmp.free, file=paste0('free',idc,'.csv')
}
Obviously, I can't test this. And there may be (likely are!) some typos in
it.
Note that it's not necessary to create variables that identify which row
the subset should start and end on; these are just looked up from the
master file when needed. Similarly, the three respective subsets are
stored in temporary data frames, because they are not (I presume) needed
when the whole thing is done. (if they were needed, then a different
strategy would be more appropriate)
There are different ways to index the loop. I just picked one.
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
On 6/8/15, 2:48 PM, "Chad Danyluck" <c.danyluck at gmail.com> wrote:
>Hello,
>
>I want to subset specific rows of data from 80 .csv files and write those
>subsets into new .csv files. The data I want to subset starts on a
>different row for each original .csv file. I've created variables that
>identify which row the subset should start and end on, but I want to loop
>through this process and I am not sure what to do. I've attempted to write
>the loop below, albeit, much of it is pseudo code. If anyone can provide
>me
>with some tips I'd appreciate it.
>
>#### This data file is used to create the variables where the subsetting
>starts and ends for each participant ####
>mig.data <- read.csv("/Users/cdanyluck/Documents/Studies/MIG -
>Dissertation/Data & Syntax/mig.data.csv")
>
># These are the variable names for the start and end of each subset of
>relevant data (baseline, audio, and free)
>participant.ids <- mig.processed.data$participant.id
>participant.baseline.start <- mig.processed.data$baseline.row.start
>participant.baseline.end <- mig.processed.data$baseline.row.end
>participant.audio.start <- mig.processed.data$audio.meditation.row.start
>participant.audio.end <- mig.processed.data$audio.meditation.row.end
>participant.free.start <- mig.processed.data$free.meditation.row.start
>participant.free.end <- mig.processed.data$free.meditation.row.end
>
># read into a list the individual files from which to subset the data
>participant.files <- list.files("/Users/cdanyluck/Documents/Studies/MIG -
>Dissertation/Data & Syntax/MIG_RAW DATA & TXT Files/Plain Text Files")
>
># loop through each participant
>for (i in 1:length(participant.files)) {
>
> # get baseline rows
> results.baseline <-
>participant.files[participant.baseline.start[i]:participant.baseline.end[i
>],]
>
> # get audio rows
> results.audio
><- participant.files[participant.audio.start[i]:participant.audio.end[i],]
>
> # get free rows
> results.free <-
>participant.files[participant.free.start[i]:participant.free.end[i],]
>
> # write out participant relevant data
> write.csv(results.baseline, file="baseline[i].csv")
> write.csv(results.audio, file = "audio[i].csv")
> write.csv(results.free, file = "free[i].csv")
>
>}
>
>--
>Chad M. Danyluck, MA
>PhD Candidate, Psychology
>University of Toronto
>
>
>
>³There is nothing either good or bad but thinking makes it so.² - William
>Shakespeare
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list