[R] How to read only specified columns from a data file

Luis Ridao luridao at gmail.com
Wed Mar 16 15:49:17 CET 2011


Thanks Sarah.

Best,
Luis

On Wed, Mar 16, 2011 at 1:19 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> On Wed, Mar 16, 2011 at 9:07 AM, Luis Ridao <luridao at gmail.com> wrote:
>> This is my code:
>>
>> mycols <- rep(NULL, 430) ; mycols[c(1,3:5)] <- rep("numeric", 4) ;
>> mycols[c(2)] <- rep("character",1)
>
> rep(NULL, 430) does not give you a vector of length 430; it gives you a NULL
> vector, and at the end of this process mycols is of length 5.
>
> So read.table() does exactly what you've told it, and reads in the columns as
> calculated from the first five rows, and gives the first five columns
> the classes
> specified in mycols.
>
> According to the documentation for read.table(), you want "NULL" rather
> than NULL anyway, and rep("NULL", 430) should work as expected.
>
> Sarah
>
>> inp <- read.table(myfile, skip=2, colClasses=mycols,fill=T)
>> head(inp)
>>
>> Best,
>> Luis
>>
>> On Wed, Mar 16, 2011 at 1:03 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>
>>> On Mar 16, 2011, at 8:13 AM, Sarah Goslee wrote:
>>>
>>>> read.table() looks at the first five rows when determining how many
>>>> columns
>>>> there are. If there are more columns in row 7 and you do not specify that
>>>> in
>>>> the read.table() command directly, they will be wrapped to the next row.
>>>>
>>>> This was discussed on the list within the last couple weeks.
>>>
>>> In addition to Sarah's comments, I also not that you did not include your
>>> code. I don't think it could have been identical to the code I suggested,
>>> which was in turn based on the code you had proposed. So ... what did you do
>>> to get that result?
>>>
>>>
>>> --
>>> David.
>>>
>>>>
>>>> Sarah
>>>>
>>>> On Wed, Mar 16, 2011 at 7:54 AM, Luis Ridao <luridao at gmail.com> wrote:
>>>>>
>>>>> David,
>>>>>
>>>>> Thanks for your tip but it seems I'm having problems with the number
>>>>> of columns R manages to read in. Below it s an example of the data read
>>>>> in:
>>>>>
>>>>>> inp[1:20,]
>>>>>
>>>>>       V1          V2        V3       V4     V5     V6     V7     V8
>>>>> V9
>>>>> 1   1.0000 log_fy_coff -1.007600 0.119520 1.0000     NA            NA
>>>>> NA
>>>>> 2   2.0000 log_fy_coff -0.935010 0.112840 0.8896 1.0000            NA
>>>>> NA
>>>>> 3   3.0000 log_fy_coff -0.876260 0.107500 0.8219 0.8847 1.0000     NA
>>>>> NA
>>>>> 4   4.0000 log_fy_coff -0.683090 0.103030 0.7656 0.8143 0.8747 1.0000
>>>>> NA
>>>>> 5   5.0000 log_fy_coff -0.623500 0.100980 0.7206 0.7636 0.8086 0.8764
>>>>> 1.0000
>>>>> 6   6.0000 log_fy_coff -0.583330 0.098978 0.6819 0.7214 0.7615 0.8150
>>>>> 0.8762
>>>>> 7   1.0000                    NA       NA     NA     NA            NA
>>>>> NA
>>>>> 8   7.0000 log_fy_coff -0.676790 0.096608 0.6521 0.6892 0.7254 0.7719
>>>>> 0.8148
>>>>> 9   0.8717      1.0000        NA       NA     NA     NA            NA
>>>>> NA
>>>>> 10  8.0000 log_fy_coff -0.696060 0.093761 0.6297 0.6654 0.6988 0.7405
>>>>> 0.7750
>>>>> 11  0.8116      0.8643  1.000000       NA     NA     NA            NA
>>>>> NA
>>>>> 12  9.0000 log_fy_coff -0.527060 0.089949 0.6003 0.6347 0.6667 0.7060
>>>>> 0.7367
>>>>>
>>>>> as you see there are only 9 columns in inp and the rest is read in in
>>>>> the following row(see row 7)
>>>>> I just don't understand why this is happening (using fill=T does not
>>>>> help either)
>>>>>
>>>>> Best,
>>>>> Luis
>>>>>
>>>>> On Tue, Mar 15, 2011 at 5:15 PM, David Winsemius <dwinsemius at comcast.net>
>>>>> wrote:
>>>>>>
>>>>>> On Mar 15, 2011, at 1:11 PM, <rex.dwyer at syngenta.com> wrote:
>>>>>>
>>>>>>> I think you need to read an introduction to R.
>>>>>>> For starters, read.table returns its results as a value, which you are
>>>>>>> not
>>>>>>> saving.
>>>>>>> The probable answer to your question:
>>>>>>> Read the whole file with read.table, and select columns you need, e.g.:
>>>>>>> tab <- read.table(myfile, skip=2)[,1:5]
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: r-help-bounces at r-project.org
>>>>>>> [mailto:r-help-bounces at r-project.org]
>>>>>>> On Behalf Of Luis Ridao
>>>>>>> Sent: Tuesday, March 15, 2011 11:53 AM
>>>>>>> To: r-help at r-project.org
>>>>>>> Subject: [R] How to read only specified columns from a data file
>>>>>>>
>>>>>>> R-help,
>>>>>>>
>>>>>>> I'm trying to read a data file with plenty of columns.
>>>>>>> I just need the first 5 but it doe not work by doing something like:
>>>>>>>
>>>>>>>> mycols <- rep(NULL, 430) ; mycols[c(1:4)] <- NA
>>>>>>>> read.table(myfile, skip=2, colClasses=mycols)
>>>>>>
>>>>>> I would have suggested:
>>>>>>
>>>>>> mycols <- rep(NULL, 430) ; mycols[1:5] <- rep("numeric", 5)
>>>>>> inp <- read.table(myfile, skip=2, colClasses=mycols)
>>>>>> head(inp)
>>>>>>
>>>>>> --
>>>>>> David.
>>>>>>
>>>>>>>
>>>>>>> Any suggestions?
>>>>>>>
>>>>
>>>> --
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>



More information about the R-help mailing list