[R] How to read only specified columns from a data file

David Winsemius dwinsemius at comcast.net
Wed Mar 16 14:03:19 CET 2011


On Mar 16, 2011, at 8:13 AM, Sarah Goslee wrote:

> read.table() looks at the first five rows when determining how many  
> columns
> there are. If there are more columns in row 7 and you do not specify  
> that in
> the read.table() command directly, they will be wrapped to the next  
> row.
>
> This was discussed on the list within the last couple weeks.

In addition to Sarah's comments, I also not that you did not include  
your code. I don't think it could have been identical to the code I  
suggested, which was in turn based on the code you had proposed.  
So ... what did you do to get that result?


-- 
David.

>
> Sarah
>
> On Wed, Mar 16, 2011 at 7:54 AM, Luis Ridao <luridao at gmail.com> wrote:
>> David,
>>
>> Thanks for your tip but it seems I'm having problems with the number
>> of columns R manages to read in. Below it s an example of the data  
>> read in:
>>
>>> inp[1:20,]
>>        V1          V2        V3       V4     V5     V6     V7      
>> V8     V9
>> 1   1.0000 log_fy_coff -1.007600 0.119520 1.0000     NA             
>> NA     NA
>> 2   2.0000 log_fy_coff -0.935010 0.112840 0.8896 1.0000             
>> NA     NA
>> 3   3.0000 log_fy_coff -0.876260 0.107500 0.8219 0.8847 1.0000      
>> NA     NA
>> 4   4.0000 log_fy_coff -0.683090 0.103030 0.7656 0.8143 0.8747  
>> 1.0000     NA
>> 5   5.0000 log_fy_coff -0.623500 0.100980 0.7206 0.7636 0.8086  
>> 0.8764 1.0000
>> 6   6.0000 log_fy_coff -0.583330 0.098978 0.6819 0.7214 0.7615  
>> 0.8150 0.8762
>> 7   1.0000                    NA       NA     NA     NA             
>> NA     NA
>> 8   7.0000 log_fy_coff -0.676790 0.096608 0.6521 0.6892 0.7254  
>> 0.7719 0.8148
>> 9   0.8717      1.0000        NA       NA     NA     NA             
>> NA     NA
>> 10  8.0000 log_fy_coff -0.696060 0.093761 0.6297 0.6654 0.6988  
>> 0.7405 0.7750
>> 11  0.8116      0.8643  1.000000       NA     NA     NA             
>> NA     NA
>> 12  9.0000 log_fy_coff -0.527060 0.089949 0.6003 0.6347 0.6667  
>> 0.7060 0.7367
>>
>> as you see there are only 9 columns in inp and the rest is read in in
>> the following row(see row 7)
>> I just don't understand why this is happening (using fill=T does not
>> help either)
>>
>> Best,
>> Luis
>>
>> On Tue, Mar 15, 2011 at 5:15 PM, David Winsemius <dwinsemius at comcast.net 
>> > wrote:
>>>
>>> On Mar 15, 2011, at 1:11 PM, <rex.dwyer at syngenta.com> wrote:
>>>
>>>> I think you need to read an introduction to R.
>>>> For starters, read.table returns its results as a value, which  
>>>> you are not
>>>> saving.
>>>> The probable answer to your question:
>>>> Read the whole file with read.table, and select columns you need,  
>>>> e.g.:
>>>> tab <- read.table(myfile, skip=2)[,1:5]
>>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org 
>>>> ]
>>>> On Behalf Of Luis Ridao
>>>> Sent: Tuesday, March 15, 2011 11:53 AM
>>>> To: r-help at r-project.org
>>>> Subject: [R] How to read only specified columns from a data file
>>>>
>>>> R-help,
>>>>
>>>> I'm trying to read a data file with plenty of columns.
>>>> I just need the first 5 but it doe not work by doing something  
>>>> like:
>>>>
>>>>> mycols <- rep(NULL, 430) ; mycols[c(1:4)] <- NA
>>>>> read.table(myfile, skip=2, colClasses=mycols)
>>>
>>> I would have suggested:
>>>
>>> mycols <- rep(NULL, 430) ; mycols[1:5] <- rep("numeric", 5)
>>> inp <- read.table(myfile, skip=2, colClasses=mycols)
>>> head(inp)
>>>
>>> --
>>> David.
>>>
>>>>
>>>> Any suggestions?
>>>>
>
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list