[R] .gct file

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Jul 20 10:52:51 CEST 2005


On Tue, 19 Jul 2005, Marc Schwartz (via MN) wrote:

> For the TAB delimited columns, adjust the 'sep' argument to:
>
> read.table("data.gct", skip = 2, header = TRUE, sep = "\t")
>
> The 'quote' argument is by default:
>
> quote = "\"'"
>
> which should take care of the quoted strings and bring them in as a
> single value.
>
> The above presumes that the header row is also TAB delimited. If not,
> you may have to set 'skip = 3' to skip over the header row and manually
> set the column names.

Not quite.  You can open a connection, skip 2 rows and read one to get the 
column names, then read the rest of the file using read.table on the open 
connection using the column names you just read.

However, based on what we have been shown

read.table("data.gct", skip = 2, header = TRUE)

ought to work as the file looks as if it is white-space delimited (a tab 
is white space).

>
> HTH,
>
> Marc Schwartz
>
>
> On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote:
>> This is all extremely helpful.
>>
>> The data turns out is a little atypical, the columns are tab-delemited
>> except for the description columns
>>
>>
>> DATA1.gct looks like this
>>
>> #1.2
>> 23 3423
>> NAME DESCRIPTION VALUE
>> gene1 "a protein inducer" 1123
>> .....          .................     ......
>>
>> How do I get R to read the data as tab delemited, but read in the 2nd
>> coloumn as one value based on the quotation marks..
>>
>> thanks..
>>
>> On 7/19/05, Marc Schwartz (via MN) <mschwartz at mn.rr.com> wrote:
>>> On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
>>>> ok so the gct file looks like this:
>>>>
>>>> #1.2  (version number)
>>>> 7283 19   (matrix size)
>>>> Name Description Values
>>>> ....      .......          ......
>>>>
>>>> How can I tell R to disregard the first two lines and start reading
>>>> the 3rd line in this gct file. I would just delete them, but I do not
>>>> know how to open a gct. file
>>>>
>>>> thank you
>>>>
>>>> On 7/19/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>>>>> On 7/19/2005 12:10 PM, mark salsburg wrote:
>>>>>> I have two files to compare, one is a regular txt file that I can read
>>>>>> in no prob.
>>>>>>
>>>>>> The other is a .gct file (How do I read in this one?)
>>>>>>
>>>>>> I tried a simple
>>>>>>
>>>>>> read.table("data.gct", header = T)
>>>>>>
>>>>>> How do you suggest reading in this file??
>>>>>>
>>>>>
>>>>> .gct is not a standard filename extension.  You need to know what is in
>>>>> that file.  Where did you get it?  What program created it?
>>>>>
>>>>> Chances are the easiest thing to do is to get the program that created
>>>>> it to export in a well known format, e.g. .csv.
>>>>>
>>>>> Duncan Murdoch
>>>
>>>
>>> The above would be consistent with the info in my reply.
>>>
>>> I guess if the format is consistent, as per Mark's example above, you
>>> can use:
>>>
>>> read.table("data.gct", skip = 2, header = TRUE)
>>>
>>> which will start by skipping the first two lines and then reading in the
>>> header row and then the data.
>>>
>>> See ?read.table
>>>
>>> HTH,
>>>
>>> Marc Schwartz
>>>
>>>
>>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list