[R] Read a list of files into named R data.frames

Dennis Murphy djmuser at gmail.com
Fri Sep 9 20:41:24 CEST 2011


Hi:

Hmm, these files look familiar...Lahman database? :)

I have the 2010 set, so here's how I got this to work on my system:

files <- list.files(pattern = '*.csv')
> files
 [1] "Allstar.csv"             "AllstarFull.csv"
 [3] "Appearances.csv"         "AwardsManagers.csv"
 [5] "AwardsPlayers.csv"       "AwardsShareManagers.csv"
 [7] "AwardsSharePlayers.csv"  "Batting.csv"
 [9] "BattingPost.csv"         "Fielding.csv"
[11] "FieldingOF.csv"          "FieldingPost.csv"
[13] "HallOfFame.csv"          "HOFold.csv"
[15] "lahman58-csv.zip"        "Managers.csv"
[17] "ManagersHalf.csv"        "Master.csv"
[19] "Pitching.csv"            "PitchingPost.csv"
[21] "Salaries.csv"            "Schools.csv"
[23] "SchoolsPlayers.csv"      "SeriesPost.csv"
[25] "Teams.csv"               "TeamsFranchises.csv"
[27] "TeamsHalf.csv"           "Xref_Stats.csv"

# Have to get rid of #15, then we can proceed:
files <- files[-15]

# Get the root names out and create a similar vector
# of output files with extension .Rdata
strp <- unlist(lapply(strsplit(files, '\\.'), '[', 1))
outfiles <- paste(strp, 'Rdata', sep = '.')

# Function to read csv and write out an .Rdata file
f <- function(x, y) {
   d <- read.csv(x, header = TRUE, stringsAsFactors = FALSE)
   save(d, file = y)
  }

# Load and fire:
mapply(f, files, outfiles)

Worked for me on all but the Master file, so check your results. I
loaded in the Master file separately and saved it without problem.

HTH,
Dennis

On Fri, Sep 9, 2011 at 7:39 AM, Michael Friendly <friendly at yorku.ca> wrote:
> I have a collection of .csv files in a directory, and want to read them into
> R data.frames whose names
> are the same as the file names, without the .csv extension
>
> e.g., from
>> (files <- list.files(pattern="*.csv"))
>  [1] "Allstar.csv"             "AllstarFull.csv"
>  [3] "Appearances.csv"         "AwardsManagers.csv"
>  [5] "AwardsPlayers.csv"       "AwardsShareManagers.csv"
>  [7] "AwardsSharePlayers.csv"  "Batting.csv"
>  [9] "BattingPost.csv"         "Fielding.csv"
> [11] "FieldingOF.csv"          "FieldingPost.csv"
> [13] "HallOfFame.csv"          "HOFold.csv"
> [15] "Managers.csv"            "ManagersHalf.csv"
> [17] "Master.csv"              "Pitching.csv"
> [19] "PitchingPost.csv"        "Salaries.csv"
> [21] "Schools.csv"             "SchoolsPlayers.csv"
> [23] "SeriesPost.csv"          "Teams.csv"
> [25] "TeamsFranchises.csv"     "TeamsHalf.csv"
>
>> Allstar <- read.csv("Allstar.csv", header=TRUE)
>  ...
>> TeamsHalf <- read.csv("TeamsHalf.csv", header=TRUE)
>
> Below is what I tried, which reads all the files, but doesn't create the R
> objects in the global environment.
> What is missing here?
>
> for (i in 1:length(files)) {
>    inp <- read.csv(file=files[i], header=TRUE)
>    name <- sub(".csv", "", files[i])
>    cat("Read ", files[i], "\trows: ", nrow(inp), " cols: ", ncol(inp), "\n")
>    eval(paste(name, "<- inp"))
> }
>
> Read  Allstar.csv       rows:  4475  cols:  3
> Read  AllstarFull.csv   rows:  4676  cols:  8
> Read  Appearances.csv   rows:  94157  cols:  20
> Read  AwardsManagers.csv        rows:  57  cols:  6
> Read  AwardsPlayers.csv         rows:  2679  cols:  6
> Read  AwardsShareManagers.csv   rows:  344  cols:  7
> Read  AwardsSharePlayers.csv    rows:  6354  cols:  7
> Read  Batting.csv       rows:  93955  cols:  24
> Read  BattingPost.csv   rows:  9840  cols:  22
> Read  Fielding.csv      rows:  160710  cols:  18
> Read  FieldingOF.csv    rows:  12028  cols:  6
> Read  FieldingPost.csv  rows:  10458  cols:  17
> Read  HallOfFame.csv    rows:  3913  cols:  8
> Read  HOFold.csv        rows:  289  cols:  7
> Read  Managers.csv      rows:  3238  cols:  10
> Read  ManagersHalf.csv  rows:  93  cols:  10
> Read  Master.csv        rows:  17674  cols:  33
> Read  Pitching.csv      rows:  40432  cols:  30
> Read  PitchingPost.csv  rows:  4284  cols:  30
> Read  Salaries.csv      rows:  21464  cols:  5
> Read  Schools.csv       rows:  749  cols:  5
> Read  SchoolsPlayers.csv        rows:  6147  cols:  4
> Read  SeriesPost.csv    rows:  256  cols:  9
> Read  Teams.csv         rows:  2655  cols:  48
> Read  TeamsFranchises.csv       rows:  120  cols:  4
> Read  TeamsHalf.csv     rows:  52  cols:  10
> Read  Xref_Stats.csv    rows:  2753  cols:  3
>> ls()
> [1] "files" "i"     "inp"   "name"
>>
>
> --
> Michael Friendly     Email: friendly AT yorku DOT ca
> Professor, Psychology Dept.
> York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street    Web:   http://www.datavis.ca
> Toronto, ONT  M3J 1P3 CANADA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list