[R] R eat my data

Tue May 25 18:25:34 CEST 2010

When I encounter problems like this, I make sure each row has the 
expected number of columns.  Something like the following awk code is 
useful.

awk -F"\t" '{print NF}' id_name_gh5.txt | sort | uniq -c

Note: I'm not sure is the \t will work with the -F switch as above.

Kevin

Changbin Du wrote:
> cdu at nuuk:~/operon$ grep '^#' id_name_gh5.txt
> cdu at nuuk:~/operon$
> 
> no lines starts with #
> 
> 
> 
> On Tue, May 25, 2010 at 9:11 AM, Barry Rowlingson <
> b.rowlingson at lancaster.ac.uk> wrote:
> 
>> On Tue, May 25, 2010 at 4:42 PM, Changbin Du <changbind at gmail.com> wrote:
>>> HI, Dear R community,
>>>
>>> My original file has 1932 lines, but when I read into R, it changed to
>> 1068
>>> lines, how comes?
>>>
>>>
>>> cdu at nuuk:~/operon$ wc -l id_name_gh5.txt
>>> 1932 id_name_gh5.txt
>>>
>>>
>>>> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
>>> skip=0, header=F, fill=T)
>>>> dim(gene_name)
>>> [1] 1068    3
>>>
>>>
>>  Do any of your lines start with a "#"?
>>
>>> read.table("test.txt",sep="\t")
>>      V1
>> 1 line 1
>> 2 line 2
>> 3 line 3
>> 4 line 4
>>
>>> read.table("test.txt",comment.char="",sep="\t")
>>               V1
>> 1          line 1
>> 2      #commented
>> 3          line 2
>> 4          line 3
>> 5 #nother comment
>> 6          line 4
>>
>>  just a guess. hard to tell without the file...
>>
>> Barry
>>
> 
> 
> 

-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016