[R] rewrite a data file use write.table(), count.fields() show different pattern, any suggestion appreciated.

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue May 22 16:46:26 CEST 2007


On Tue, 22 May 2007, Yong Wang wrote:

> Thank you for the suggestion, Dr. Ripley

I made no suggestion: I asked a question you have not answered.

> However, I am a little bit confused. My understanding is that you
> suspect the should-be-quoted fields (factor or character fields)
> contains tabs.
>
> if this is the case,  count.fields()  should detect the tab,
> read.table(sep="t\") should read with the same awareness, and if
> write.table(sep"\t") write and seperate with tab those fields as
> acknowldged by read.table(sep="t\"), the two field counts should be
> the same.

There are too many 'shoulds' in that sentence, and one of them is 
incorrect. Consider:

> count.fields("test.dat", sep="\t")
[1] 3
> A <- read.table("test.dat", sep="\t")
> ncol(A)
[1] 3
> write.table(A,"test2.dat", eol="\n",sep="\t",quote=F,row.names=F,
               col.names=F)
> count.fields("test2.dat", sep="\t")
[1] 4
> write.table(A,"test3.dat", eol="\n",sep="\t",row.names=F, col.names=F)
> count.fields("test3.dat", sep="\t")
[1] 3

and I'll leave you to reconstruct test.dat to ensure you understand.
(BTW, you didn't show us even a sample of your dataset.)


> anyway, I will try to redo it per your suggestion.
>
> Regards
> yong
>
>
> On 5/22/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>> If you write out unquoted fields, how do you know they do not contain
>> tabs?
>> 
>> The default is quote=TRUE for a good reason.
>> 
>> On Tue, 22 May 2007, Yong Wang wrote:
>> 
>> > Dear all:
>> >
>> > I read in a tab delimited dataset, and then write it out as another
>> > file as following: I did this simply to make sure I understand the
>> > behavior of this command.
>> >
>> > data<-read.table(file,header=F,sep="\t",fill=T,colClasses="character");
>> > 
>> write.table(data,file="newdata.txt",eol="\n",sep="\t",quote=F,row.names=F);
>> >
>> >
>> > cf1 <- count.fields(newdata.txt, sep="\t")
>> > table(cf1)
>> > 13   17       23
>> > 10   126   5445
>> >
>> > # is different to
>> >
>> > cf2 <- count.fields(file,sep="\t")
>> > 13   17       23        33
>> > 10   106   5433      32
>> >
>> > the worst problem is the maximal value of cf1 (33) is larger than the
>> > maximal value of cf2 (23) which is the right number of fields for most
>> > rows in the original file.
>> >
>> > I need to use write.table for some important data manipulation work,
>> > your suggestion is
>> > highly appreciated.
>> >
>> > Best Regards
>> >
>> > ______________________________________________
>> > R-help at stat.math.ethz.ch mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> 
>> --
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>> 
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list