[R] Any existing functions for reading and extracting data from path names?

Fri Mar 11 19:02:22 CET 2011

Hi,

the R.filesets package was designed for this.  It is heavily used by
the aroma framework (http://www.aroma-project.org/), so it got a fair
bit of mileage now (in a good a way).  Here is how you could setup
your data set and work with the data.

# - - - - - - - - - - - -
# Setup file data set
# - - - - - - - - - - - -
library("R.filesets");
paths <- list.files(path="deleteme", full.names=TRUE);
dsList <- lapply(paths, FUN=function(path) TabularTextFileSet$byPath(path));
ds <- Reduce(append, dsList);

# Fullname translator: Los Angeles/data1.csv => Los Angeles,data1.csv
setFullNamesTranslator(ds, function(name, file, ...) {
  path <- getPath(file);
  paste(c(basename(path), name), collapse=",");
});

# - - - - - - - - - - - -
# Examples
# - - - - - - - - - - - -
# Get the full names (a fullname consists of
# a name and comma-separated tags)
> getFullNames(ds)
[1] "Los Angeles,data1" "Los Angeles,data2"
[3] "New York,data1" "New York,data2"

# Get the names
> getNames(ds)
[1] "Los Angeles" "Los Angeles"
[3] "New York"    "New York"

> ds
TabularTextFileSet:
Name: Los Angeles
Tags:
Full name: Los Angeles
Number of files: 4
Names: Los Angeles, Los Angeles, New York, New York [4]
Path (to the first file): deleteme/Los Angeles
Total file size: 0.00 MB
RAM: 0.01MB

# Get 2nd file
> df <- getFile(ds, 2)
> df

TabularTextFile:
Name: Los Angeles
Tags: data2
Full name: Los Angeles,data2
Pathname: deleteme/Los Angeles/data2.csv
File size: 80 bytes
RAM: 0.01 MB
Number of data rows: 10
Columns [2]: '', 'x'
Number of text lines: 11

# Read one data file
> data <- readDataFrame(df)
> data
       x
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

# Read all data files
> dataList <- lapply(ds, readDataFrame)
> dataList
$`Los Angeles,data1
       x
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

$`Los Angeles,data2
       x
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

$`New York,data1`
       x
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

$`New York,data2`
       x
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

Most methods in R.filesets are currently poorly documented (no
time/resources/...), but there is more in there than documented so
feel free to ask if you have any questions.

Hope this helps

/Henrik

On Fri, Mar 11, 2011 at 8:52 AM, Ista Zahn <izahn at psych.rochester.edu> wrote:
> Hi helpeRs,
>
> I have inherited a set of data files that use the file system as a
> sort of poor man's database, i.e., the data files are nested in
> directories that indicate which city they come from. For example:
>
> dir.create("deleteme")
> for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) {
>    dir.create(i)
>    for(j in paste("data", 1:2, ".csv", sep="")) {
>        write.csv(data.frame(x=1:10), file=paste(i, j, sep="/"))
>    }
> }
>
> list.files("deleteme", recursive=TRUE)
>
> What I want to end up with is
>
>  x        city wave
>  1    New York    1
>  1 Los Angeles    1
>  1    New York    2
>  1 Los Angeles    2
>
> I've started writting a simple function to do this, but it seems like
> a common situation and I'm wondering if there are any packages or
> functions that might make this easier.
>
> Thanks!
> Ista
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>