[R] how to read in multiple files with unequal number of columns

jim holtman jholtman at gmail.com
Wed Apr 23 14:47:18 CEST 2008


Is this what you want?  I am assuming that you will read the
dataframes into a list and then process them like below:

> # put dataframe in a list -- would have read them in via a list
> x <- list(d, my.fake.data)
> # determine maximum number of columns and then pad out the short one
> # also use the column names of the largest one
>
> col.max <- max(sapply(x, ncol))
> colNames <- lapply(x, function(.data){
+     if (ncol(.data) == col.max) colnames(.data)
+ })[[1]]
> new.data <- lapply(x, function(.data){
+     if (ncol(.data) < col.max){
+         .data[(ncol(.data) + 1):col.max] <- NA
+         colnames(.data) <- colNames
+     }
+     .data
+ })
> all <- do.call(rbind, new.data)
> all
   x  y  fac
1  1  1    B
2  1  2    B
3  1  3    B
4  1  4    B
5  1  5    A
6  1  6    A
7  1  7    C
8  1  8    C
9  1  9    A
10 1 10    C
11 1  2 <NA>
>


On Tue, Apr 22, 2008 at 9:05 AM, Tania Oh <tania.oh at bnc.ox.ac.uk> wrote:
> Dear all,
>
> I want to read in 1000 files which contain varying number of columns.
> For example:
>
> file[1] contains 8 columns (mixture of characters and numbers)
> file[2] contains 16 columns etc
>
> I'm reading everything into one big data frame and when I try rbind, R
> returns an error of
> "Error in rbind(deparse.level, ...) :
>   numbers of columns of arguments do not match"
>
>
> Below is my code:
>
> all <- NULL
> all <- as.data.frame(all)
>
> ##read in the contents of the files
> for (f in 1:length(fnames)){
>
>       tmp <- try(read.table(fnames[f], header=F, fill=T, sep="\t"),
> TRUE)
>
>       if (class(tmp) == "try-error") {
>               next ## skip this file if it's empty/non-existent
>        }else{
>               ## combine all the file contents into one big data frame
>                all <- rbind(all, tmp)
>   }
> }
>
>
> Here is some example of what the data in the files:
>
> L3 <- LETTERS[1:3]
> (d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE)))
>
>  > str(d)
> 'data.frame':   10 obs. of  3 variables:
>  $ x  : num  1 1 1 1 1 1 1 1 1 1
>  $ y  : num  1 2 3 4 5 6 7 8 9 10
>  $ fac: Factor w/ 3 levels "A","B","C": 1 3 1 2 2 2 2 1 1 2
>
> my.fake.data <- data.frame(cbind(x=1, y=2))
>  > str(my.fake.data)
> 'data.frame':   1 obs. of  2 variables:
>  $ x: num 1
>  $ y: num 2
>
>
> all <- rbind(d, my.fake.data)
>
> Error in rbind(deparse.level, ...) :
>   numbers of columns of arguments do not match
>
>
> I've searched the R-site but couldn't find any relevant solution.I
> might have used the wrong keywords to search, so if this question has
> been answered already, I'd be very grateful if someone could point me
> to the post. Else any help/suggestions would be greatly appreciated.
>
> Many thanks in advance,
> tania
>
> D.Phil student
> Department of Physiology, Anatomy and Genetics
> University of Oxford
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list