[R] Any existing functions for reading and extracting data from path names?
Ista Zahn
izahn at psych.rochester.edu
Fri Mar 11 19:15:13 CET 2011
Thanks Henrik, that is exactly what I was hoping for!
Best,
Ista
On Fri, Mar 11, 2011 at 1:02 PM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:
> Hi,
>
> the R.filesets package was designed for this. It is heavily used by
> the aroma framework (http://www.aroma-project.org/), so it got a fair
> bit of mileage now (in a good a way). Here is how you could setup
> your data set and work with the data.
>
>
> # - - - - - - - - - - - -
> # Setup file data set
> # - - - - - - - - - - - -
> library("R.filesets");
> paths <- list.files(path="deleteme", full.names=TRUE);
> dsList <- lapply(paths, FUN=function(path) TabularTextFileSet$byPath(path));
> ds <- Reduce(append, dsList);
>
> # Fullname translator: Los Angeles/data1.csv => Los Angeles,data1.csv
> setFullNamesTranslator(ds, function(name, file, ...) {
> path <- getPath(file);
> paste(c(basename(path), name), collapse=",");
> });
>
>
>
> # - - - - - - - - - - - -
> # Examples
> # - - - - - - - - - - - -
> # Get the full names (a fullname consists of
> # a name and comma-separated tags)
>> getFullNames(ds)
> [1] "Los Angeles,data1" "Los Angeles,data2"
> [3] "New York,data1" "New York,data2"
>
> # Get the names
>> getNames(ds)
> [1] "Los Angeles" "Los Angeles"
> [3] "New York" "New York"
>
>> ds
> TabularTextFileSet:
> Name: Los Angeles
> Tags:
> Full name: Los Angeles
> Number of files: 4
> Names: Los Angeles, Los Angeles, New York, New York [4]
> Path (to the first file): deleteme/Los Angeles
> Total file size: 0.00 MB
> RAM: 0.01MB
>
>
> # Get 2nd file
>> df <- getFile(ds, 2)
>> df
>
> TabularTextFile:
> Name: Los Angeles
> Tags: data2
> Full name: Los Angeles,data2
> Pathname: deleteme/Los Angeles/data2.csv
> File size: 80 bytes
> RAM: 0.01 MB
> Number of data rows: 10
> Columns [2]: '', 'x'
> Number of text lines: 11
>
>
>
> # Read one data file
>> data <- readDataFrame(df)
>> data
> x
> 1 1 1
> 2 2 2
> 3 3 3
> 4 4 4
> 5 5 5
> 6 6 6
> 7 7 7
> 8 8 8
> 9 9 9
> 10 10 10
>
>
> # Read all data files
>> dataList <- lapply(ds, readDataFrame)
>> dataList
> $`Los Angeles,data1
> x
> 1 1 1
> 2 2 2
> 3 3 3
> 4 4 4
> 5 5 5
> 6 6 6
> 7 7 7
> 8 8 8
> 9 9 9
> 10 10 10
>
> $`Los Angeles,data2
> x
> 1 1 1
> 2 2 2
> 3 3 3
> 4 4 4
> 5 5 5
> 6 6 6
> 7 7 7
> 8 8 8
> 9 9 9
> 10 10 10
>
> $`New York,data1`
> x
> 1 1 1
> 2 2 2
> 3 3 3
> 4 4 4
> 5 5 5
> 6 6 6
> 7 7 7
> 8 8 8
> 9 9 9
> 10 10 10
>
> $`New York,data2`
> x
> 1 1 1
> 2 2 2
> 3 3 3
> 4 4 4
> 5 5 5
> 6 6 6
> 7 7 7
> 8 8 8
> 9 9 9
> 10 10 10
>
> Most methods in R.filesets are currently poorly documented (no
> time/resources/...), but there is more in there than documented so
> feel free to ask if you have any questions.
>
> Hope this helps
>
> /Henrik
>
> On Fri, Mar 11, 2011 at 8:52 AM, Ista Zahn <izahn at psych.rochester.edu> wrote:
>> Hi helpeRs,
>>
>> I have inherited a set of data files that use the file system as a
>> sort of poor man's database, i.e., the data files are nested in
>> directories that indicate which city they come from. For example:
>>
>> dir.create("deleteme")
>> for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) {
>> dir.create(i)
>> for(j in paste("data", 1:2, ".csv", sep="")) {
>> write.csv(data.frame(x=1:10), file=paste(i, j, sep="/"))
>> }
>> }
>>
>> list.files("deleteme", recursive=TRUE)
>>
>> What I want to end up with is
>>
>> x city wave
>> 1 New York 1
>> 1 Los Angeles 1
>> 1 New York 2
>> 1 Los Angeles 2
>>
>> I've started writting a simple function to do this, but it seems like
>> a common situation and I'm wondering if there are any packages or
>> functions that might make this easier.
>>
>> Thanks!
>> Ista
>> --
>> Ista Zahn
>> Graduate student
>> University of Rochester
>> Department of Clinical and Social Psychology
>> http://yourpsyche.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org
More information about the R-help
mailing list