[R] .gct file

Wed Jul 20 12:00:57 CEST 2005

Those wondering what gtc file stands for, might be interested in [1].

The original poster can see if the package 'ctc' [2] supports reading in
this format but I think Prof. Ripley's solution works too.

[1]http://www.broad.mit.edu/cancer/software/genepattern/tutorial/gp_tutorial_fileformats.html
[2]http://www.bioconductor.org/packages/bioc/stable/src/contrib/html/ctc.html

On Wed, 2005-07-20 at 09:52 +0100, Prof Brian Ripley wrote:
> On Tue, 19 Jul 2005, Marc Schwartz (via MN) wrote:
> 
> > For the TAB delimited columns, adjust the 'sep' argument to:
> >
> > read.table("data.gct", skip = 2, header = TRUE, sep = "\t")
> >
> > The 'quote' argument is by default:
> >
> > quote = "\"'"
> >
> > which should take care of the quoted strings and bring them in as a
> > single value.
> >
> > The above presumes that the header row is also TAB delimited. If not,
> > you may have to set 'skip = 3' to skip over the header row and manually
> > set the column names.
> 
> Not quite.  You can open a connection, skip 2 rows and read one to get the 
> column names, then read the rest of the file using read.table on the open 
> connection using the column names you just read.
> 
> However, based on what we have been shown
> 
> read.table("data.gct", skip = 2, header = TRUE)
> 
> ought to work as the file looks as if it is white-space delimited (a tab 
> is white space).
> 
> >
> > HTH,
> >
> > Marc Schwartz
> >
> >
> > On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote:
> >> This is all extremely helpful.
> >>
> >> The data turns out is a little atypical, the columns are tab-delemited
> >> except for the description columns
> >>
> >>
> >> DATA1.gct looks like this
> >>
> >> #1.2
> >> 23 3423
> >> NAME DESCRIPTION VALUE
> >> gene1 "a protein inducer" 1123
> >> .....          .................     ......
> >>
> >> How do I get R to read the data as tab delemited, but read in the 2nd
> >> coloumn as one value based on the quotation marks..
> >>
> >> thanks..
> >>
> >> On 7/19/05, Marc Schwartz (via MN) <mschwartz at mn.rr.com> wrote:
> >>> On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
> >>>> ok so the gct file looks like this:
> >>>>
> >>>> #1.2  (version number)
> >>>> 7283 19   (matrix size)
> >>>> Name Description Values
> >>>> ....      .......          ......
> >>>>
> >>>> How can I tell R to disregard the first two lines and start reading
> >>>> the 3rd line in this gct file. I would just delete them, but I do not
> >>>> know how to open a gct. file
> >>>>
> >>>> thank you
> >>>>
> >>>> On 7/19/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> >>>>> On 7/19/2005 12:10 PM, mark salsburg wrote:
> >>>>>> I have two files to compare, one is a regular txt file that I can read
> >>>>>> in no prob.
> >>>>>>
> >>>>>> The other is a .gct file (How do I read in this one?)
> >>>>>>
> >>>>>> I tried a simple
> >>>>>>
> >>>>>> read.table("data.gct", header = T)
> >>>>>>
> >>>>>> How do you suggest reading in this file??
> >>>>>>
> >>>>>
> >>>>> .gct is not a standard filename extension.  You need to know what is in
> >>>>> that file.  Where did you get it?  What program created it?
> >>>>>
> >>>>> Chances are the easiest thing to do is to get the program that created
> >>>>> it to export in a well known format, e.g. .csv.
> >>>>>
> >>>>> Duncan Murdoch
> >>>
> >>>
> >>> The above would be consistent with the info in my reply.
> >>>
> >>> I guess if the format is consistent, as per Mark's example above, you
> >>> can use:
> >>>
> >>> read.table("data.gct", skip = 2, header = TRUE)
> >>>
> >>> which will start by skipping the first two lines and then reading in the
> >>> header row and then the data.
> >>>
> >>> See ?read.table
> >>>
> >>> HTH,
> >>>
> >>> Marc Schwartz
> >>>
> >>>
> >>>
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>