[R] Any existing functions for reading and extracting data from path names?
Henrik Bengtsson
hb at biostat.ucsf.edu
Fri Mar 11 19:02:22 CET 2011
Hi,
the R.filesets package was designed for this. It is heavily used by
the aroma framework (http://www.aroma-project.org/), so it got a fair
bit of mileage now (in a good a way). Here is how you could setup
your data set and work with the data.
# - - - - - - - - - - - -
# Setup file data set
# - - - - - - - - - - - -
library("R.filesets");
paths <- list.files(path="deleteme", full.names=TRUE);
dsList <- lapply(paths, FUN=function(path) TabularTextFileSet$byPath(path));
ds <- Reduce(append, dsList);
# Fullname translator: Los Angeles/data1.csv => Los Angeles,data1.csv
setFullNamesTranslator(ds, function(name, file, ...) {
path <- getPath(file);
paste(c(basename(path), name), collapse=",");
});
# - - - - - - - - - - - -
# Examples
# - - - - - - - - - - - -
# Get the full names (a fullname consists of
# a name and comma-separated tags)
> getFullNames(ds)
[1] "Los Angeles,data1" "Los Angeles,data2"
[3] "New York,data1" "New York,data2"
# Get the names
> getNames(ds)
[1] "Los Angeles" "Los Angeles"
[3] "New York" "New York"
> ds
TabularTextFileSet:
Name: Los Angeles
Tags:
Full name: Los Angeles
Number of files: 4
Names: Los Angeles, Los Angeles, New York, New York [4]
Path (to the first file): deleteme/Los Angeles
Total file size: 0.00 MB
RAM: 0.01MB
# Get 2nd file
> df <- getFile(ds, 2)
> df
TabularTextFile:
Name: Los Angeles
Tags: data2
Full name: Los Angeles,data2
Pathname: deleteme/Los Angeles/data2.csv
File size: 80 bytes
RAM: 0.01 MB
Number of data rows: 10
Columns [2]: '', 'x'
Number of text lines: 11
# Read one data file
> data <- readDataFrame(df)
> data
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
# Read all data files
> dataList <- lapply(ds, readDataFrame)
> dataList
$`Los Angeles,data1
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
$`Los Angeles,data2
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
$`New York,data1`
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
$`New York,data2`
x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Most methods in R.filesets are currently poorly documented (no
time/resources/...), but there is more in there than documented so
feel free to ask if you have any questions.
Hope this helps
/Henrik
On Fri, Mar 11, 2011 at 8:52 AM, Ista Zahn <izahn at psych.rochester.edu> wrote:
> Hi helpeRs,
>
> I have inherited a set of data files that use the file system as a
> sort of poor man's database, i.e., the data files are nested in
> directories that indicate which city they come from. For example:
>
> dir.create("deleteme")
> for(i in paste("deleteme", c("New York", "Los Angeles"), sep="/")) {
> dir.create(i)
> for(j in paste("data", 1:2, ".csv", sep="")) {
> write.csv(data.frame(x=1:10), file=paste(i, j, sep="/"))
> }
> }
>
> list.files("deleteme", recursive=TRUE)
>
> What I want to end up with is
>
> x city wave
> 1 New York 1
> 1 Los Angeles 1
> 1 New York 2
> 1 Los Angeles 2
>
> I've started writting a simple function to do this, but it seems like
> a common situation and I'm wondering if there are any packages or
> functions that might make this easier.
>
> Thanks!
> Ista
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list