[R] Separator with " | " for read.table
jim holtman
jholtman at gmail.com
Mon Jun 16 03:39:25 CEST 2008
I am not exactly sure what you are after, but if you are just printing
out a single column, then unless you use "drop=FALSE" in referencing
it, it is a vector:
> x <- read.table(textConnection("#GDS_ID GENE_NAME GENE_DESCRIPTION GENE_FUNCTION
+ 1007_s_at | DDR1 | discoidin domain receptor tyrosine kinase 1 |
protein-coding
+ 1053_at | RFC2 | replication factor C (activator 1) 2, 40kDa | protein-coding
+ 117_at | HSPA6 | heat shock 70kDa protein 6 (HSP70B') |
protein-coding"), sep="|", quote='')
> closeAllConnections()
> str(x)
'data.frame': 3 obs. of 4 variables:
$ V1: Factor w/ 3 levels "1007_s_at ","1053_at ",..: 1 2 3
$ V2: Factor w/ 3 levels " DDR1 "," HSPA6 ",..: 1 3 2
$ V3: Factor w/ 3 levels " discoidin domain receptor tyrosine kinase
1 ",..: 1 3 2
$ V4: Factor w/ 1 level " protein-coding": 1 1 1
> print(x$V3)
[1] discoidin domain receptor tyrosine kinase 1 replication factor
C (activator 1) 2, 40kDa
[3] heat shock 70kDa protein 6 (HSP70B')
3 Levels: discoidin domain receptor tyrosine kinase 1 ...
replication factor C (activator 1) 2, 40kDa
> x$V3
[1] discoidin domain receptor tyrosine kinase 1 replication factor
C (activator 1) 2, 40kDa
[3] heat shock 70kDa protein 6 (HSP70B')
3 Levels: discoidin domain receptor tyrosine kinase 1 ...
replication factor C (activator 1) 2, 40kDa
> x[, "V3", drop=FALSE] # is this what you were expecting
V3
1 discoidin domain receptor tyrosine kinase 1
2 replication factor C (activator 1) 2, 40kDa
3 heat shock 70kDa protein 6 (HSP70B')
>
On Sun, Jun 15, 2008 at 9:31 PM, Gundala Viswanath <gundalav at gmail.com> wrote:
> Thanks so much Jim,
>
> It works. However how come the "\n" was not removed.
> Meaning when I do:
>
> print (x$V3)
>
> it gives something like this:
> __OUTPUT__
> [1] discoidin domain receptor tyrosine kinase 1
>
> [2] replication factor C (activator 1) 2, 40kDa
>
> [3] heat shock 70kDa protein 6 (HSP70B')
>
> __END__
>
> Note the spacing between the entries. I expect something like:
>
> [1] discoidin domain receptor tyrosine kinase 1
> [2] replication factor C (activator 1) 2, 40kDa
> [3] heat shock 70kDa protein 6 (HSP70B')
> __END__
>
> Do you have any idea how to fix this?
>
>
>
> - Gundala Viswanath
> Jakarta - Indonesia
>
>
> On Mon, Jun 16, 2008 at 10:19 AM, jim holtman <jholtman at gmail.com> wrote:
>> Does this give you what you want:
>>
>>> x <- read.table(textConnection("#GDS_ID GENE_NAME GENE_DESCRIPTION GENE_FUNCTION
>> + 1007_s_at | DDR1 | discoidin domain receptor tyrosine kinase 1 |
>> protein-coding
>> + 1053_at | RFC2 | replication factor C (activator 1) 2, 40kDa | protein-coding
>> + 117_at | HSPA6 | heat shock 70kDa protein 6 (HSP70B') |
>> protein-coding"), sep="|", quote='')
>>> closeAllConnections()
>>>
>>> x
>> V1 V2 V3
>> V4
>> 1 1007_s_at DDR1 discoidin domain receptor tyrosine kinase 1
>> protein-coding
>> 2 1053_at RFC2 replication factor C (activator 1) 2, 40kDa
>> protein-coding
>> 3 117_at HSPA6 heat shock 70kDa protein 6 (HSP70B')
>> protein-coding
>>>
>>
>>
>> You had a quote(') in your data; you need to have quote='' in the read.table.
>>
>> On Sun, Jun 15, 2008 at 9:11 PM, Gundala Viswanath <gundalav at gmail.com> wrote:
>>> Hi,
>>>
>>> I have the following data file to be parsed and captured as a data frame:
>>>
>>> __DATA__
>>> #GDS_ID GENE_NAME GENE_DESCRIPTION GENE_FUNCTION
>>> 1007_s_at | DDR1 | discoidin domain receptor tyrosine kinase 1 | protein-coding
>>> 1053_at | RFC2 | replication factor C (activator 1) 2, 40kDa | protein-coding
>>> 117_at | HSPA6 | heat shock 70kDa protein 6 (HSP70B') | protein-coding
>>>
>>> __END__
>>>
>>> In particular it is separated by " | " , namely - space, bar, space.
>>> However I tried this without avail:
>>>
>>> geneinfo <- read.table("mydata.txt", sep=" | ", comment.char="\#")
>>> print(geneinfo)
>>>
>>> I also tried with sep= "|", it gave a wrong parsing. Please advice.
>>>
>>> - Gundala Viswanath
>>> Jakarta - Indonesia
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem you are trying to solve?
>>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list