[R] column names with rbind loop
David Winsemius
dwinsemius at comcast.net
Tue Aug 30 22:16:29 CEST 2011
On Aug 30, 2011, at 3:46 PM, Vining, Kelly wrote:
> Thanks much for your help! This almost works. However, now I am
> getting the following error:
>
>> for(i in all.files) {
> + if (i==all.files[1]) new.data <- read.table(i,header=TRUE) else {
> + new.data <- rbind(new.data, read.table(i))}}
> Error in match.names(clabs, names(xi)) :
> names do not match previous names
You need to decide if you are going to use the V-names or the original
names because rbind requires that the names for its two arguments
match. Perhaps you want to read the names in separately with
readLines? Also look at the skip argument to read.tables.
>
> I am wondering if this is because R adds row numbers as a numerical
> column to the table of the first file it reads?
It does so by default. It needs to have some sort of character vector
to use, so if there is none, it makes up names. I don't understand why
you didn't use, ... header =TRUE, for both read operations.
--
David.
>
>
> ________________________________________
> From: Weidong Gu [anopheles123 at gmail.com]
> Sent: Tuesday, August 30, 2011 12:00 PM
> To: Vining, Kelly
> Cc: r-help at r-project.org
> Subject: Re: [R] column names with rbind loop
>
> How about to add a conditional statement to get the header from 1st
> file
>
> for(i in all.files) {
> if (i==all.files[1]) new.data <- read.table(i,header=TRUE) else {
> new.data <- rbind(new.data, read.table(i))}}
>
>
> Weidong Gu
>
>
> On Tue, Aug 30, 2011 at 1:42 PM, Vining, Kelly
> <Kelly.Vining at oregonstate.edu> wrote:
>> Hello R users.
>>
>> This is a fairly basic question:
>>
>> I am concatenating data from sets of files in a directory using a
>> loop. The column names in all files are exactly the same. My
>> understanding is that rbind takes column names from the first file
>> it reads. However, my output is showing that the column names are
>> treated as a first data row, not treated as headers.
>>
>> I compile my file names like this:
>>
>>> all.files <- list.files()
>>> all.files
>> [1] "1.rpkm" "10.rpkm" "11.rpkm" "12.rpkm" "13.rpkm" "14.rpkm"
>> [7] "15.rpkm" "16.rpkm" "17.rpkm" "18.rpkm" "19.rpkm" "2.rpkm"
>> [13] "3.rpkm" "4.rpkm" "5.rpkm" "6.rpkm" "7.rpkm" "8.rpkm"
>> [19] "9.rpkm"
>>
>> Then loop through them like this:
>>> new.data <- NULL
>>> for(i in all.files) {
>> + in.data <- read.table(i)
>> + new.data <- rbind(new.data, in.data)}
>>> head(new.data)
>> V1 V2 V3 V4 V5 V6 V7
>> 1 seq_id source type start end score strand
>> 2 scaffold_1 Ptrichocarpav2_0 gene_body 12639 13384 . +
>> 3 scaffold_1 Ptrichocarpav2_0 gene_body 22190 22516 . +
>> 4 scaffold_1 Ptrichocarpav2_0 gene_body 74076 75893 . +
>> 5 scaffold_1 Ptrichocarpav2_0 gene_body 80207 81289 . -
>> 6 scaffold_1 Ptrichocarpav2_0 gene_body 105236 107712 . +
>>
>>
>> As you can see, R is putting a "V1, V2..." header row here because
>> I didn't say "header=TRUE" in my read.table command. But if I do
>> this within the loop, I get an error. If I try to delete the V1, V2
>> row after the fact by
>>
>> new.data <- new.data[-1,]
>>
>> R deletes my "real" header row.
>>
>> How can I get the header that I want?
>>
>> Thanks for any help,
>> --Kelly V.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list