[R] column names with rbind loop

Tue Aug 30 22:16:29 CEST 2011

On Aug 30, 2011, at 3:46 PM, Vining, Kelly wrote:

> Thanks much for your help! This almost works. However, now I am  
> getting the following error:
>
>> for(i in all.files) {
> + if (i==all.files[1]) new.data <- read.table(i,header=TRUE) else {
> + new.data <- rbind(new.data, read.table(i))}}
> Error in match.names(clabs, names(xi)) :
>  names do not match previous names

You need to decide if you are going to use the V-names or the original  
names because rbind requires that the names for its two arguments  
match. Perhaps you want to read the names in separately with  
readLines? Also look at the skip argument to read.tables.

>
> I am wondering if this is because R adds row numbers as a numerical  
> column to the table of the first file it reads?

It does so by default. It needs to have some sort of character vector  
to use, so if there is none, it makes up names. I don't understand why  
you didn't use, ... header =TRUE, for both read operations.

-- 
David.

>
>
> ________________________________________
> From: Weidong Gu [anopheles123 at gmail.com]
> Sent: Tuesday, August 30, 2011 12:00 PM
> To: Vining, Kelly
> Cc: r-help at r-project.org
> Subject: Re: [R] column names with rbind loop
>
> How about to add a conditional statement to get the header from 1st  
> file
>
> for(i in all.files) {
> if (i==all.files[1]) new.data <- read.table(i,header=TRUE) else {
> new.data <- rbind(new.data, read.table(i))}}
>
>
> Weidong Gu
>
>
> On Tue, Aug 30, 2011 at 1:42 PM, Vining, Kelly
> <Kelly.Vining at oregonstate.edu> wrote:
>> Hello R  users.
>>
>> This is a fairly basic question:
>>
>> I am concatenating data from sets of files in a directory using a  
>> loop. The column names in all files are exactly the same. My  
>> understanding is that rbind takes column names from the first file  
>> it reads. However, my output is showing that the column names are  
>> treated as a first data row, not treated as headers.
>>
>> I compile my file names like this:
>>
>>> all.files <- list.files()
>>> all.files
>> [1] "1.rpkm"  "10.rpkm" "11.rpkm" "12.rpkm" "13.rpkm" "14.rpkm"
>> [7] "15.rpkm" "16.rpkm" "17.rpkm" "18.rpkm" "19.rpkm" "2.rpkm"
>> [13] "3.rpkm"  "4.rpkm"  "5.rpkm"  "6.rpkm"  "7.rpkm"  "8.rpkm"
>> [19] "9.rpkm"
>>
>> Then loop through them like this:
>>> new.data <- NULL
>>> for(i in all.files) {
>> + in.data <- read.table(i)
>> + new.data <- rbind(new.data, in.data)}
>>> head(new.data)
>>        V1               V2        V3     V4     V5    V6     V7
>> 1     seq_id           source      type  start    end score strand
>> 2 scaffold_1 Ptrichocarpav2_0 gene_body  12639  13384     .      +
>> 3 scaffold_1 Ptrichocarpav2_0 gene_body  22190  22516     .      +
>> 4 scaffold_1 Ptrichocarpav2_0 gene_body  74076 75893     .      +
>> 5 scaffold_1 Ptrichocarpav2_0 gene_body  80207  81289     .      -
>> 6 scaffold_1 Ptrichocarpav2_0 gene_body 105236 107712     .      +
>>
>>
>> As you can see, R is putting a "V1, V2..." header row here because  
>> I didn't say "header=TRUE" in my read.table command. But if I do  
>> this within the loop, I get an error. If I try to delete the V1, V2  
>> row after the fact by
>>
>> new.data <- new.data[-1,]
>>
>> R deletes my "real" header row.
>>
>> How can I get the header that I want?
>>
>> Thanks for any help,
>> --Kelly V.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT